⇐ ⇒

[CF-metadata] some concerns about the "ensemble axis" proposal

From: Steve Hankin <Steven.C.Hankin>
Date: Thu, 22 Feb 2007 22:32:42 -0700

Hi all,

While at a meeting together recently Balaji and I had a chance to
discuss CF issues. We realized that we shared a number of concerns
about the ensemble axis proposal that has been under discussion on the
CF email list. We have broadly agreed on the need to make CF
discussions thorough and systematic, so while it is a shame to slow
things down :-( , Balaji and I hope there will be general support for
the appropriateness of airing these concerns. When the CF discussion
Wiki gets into full service soon, we can migrate this topic to that
venue and hopefully create greater clarity while doing it.

Below the concerns are raised in 2 parts

   1. briefly enumerated
   2. discussed in greater depth

    - Steve & Balaji

====================

Concerns with the proposal for a CF "ensemble axis":

   1. The ensemble axis proposal does not solve the general multi-model
      ensemble problem.
   2. A netCDF-style ensemble axis is a marginal model for the
      underlying problem.
   3. In order to accommodate the ensemble axis concept other basic
      concepts of the CF data model must be extended and made more
      complicated.
   4. In realistic data management scenarios the ensemble axis will not
      be a sufficient solution to the problem; "aggregation servers"
      will be needed as well. (And when aggregation servers are
      introduced into the problem space, there are alternative
      approaches that should be considered, too.)
   5. If implemented the proposal will create significant barriers to
      interoperability.
   6. The proposal potentially compromises the future quality of CF
      because netCDF 4 will offer solutions that model the problem properly.

Discussion of the concerns in greater detail (same numbering):

   1. The ensemble axis proposal does not solve the general multi-model
      ensemble problem.
          * In earlier discussions it was stated that the ensemble axis
            is intended to address multi-model ensembles. But the model
            runs in a multi-model ensemble will in general be on a
            number of differing grids. The proposal addresses only the
            limited sub-case in which all models have been re-gridded to
            the same grid. This leaves unanswered how CF can support
            ensembles on multiple grids. We should explore the answer
            to the general question before committing to the specialized
            solution.
   2. A netCDF-style ensemble axis is a marginal model for the
      underlying problem.
          * The "ensemble axis" is not an ordered axis. So when clients
            are working with models from an ensemble they will often not
            be accessing contiguous ranges of indices on the axis.
            NetCDF dimensions are ordered and can only provide direct
            API support for contiguous ranges on a dimension. So the
            ensemble axis proposal will not provide the usual and
            expected benefits of a netCDF dimension.
   3. In order to accommodate the ensemble axis concept other basic
      concepts of the CF data model must be extended and made more
      complicated.
          * For multi-model ensembles the members of the proposed
            ensemble axis must be identified by some kind of unique
            coordinate value. Since there is no associated ordered
            (numerical) coordinate, the ensemble members must be
            identified by metadata (typically strings) from the
            individual model runs (i.e. global attributes). In fact, it
            will often be the case that there is no single global
            attribute that uniquely identifies a model. Rather it will
            be a combination of metadata such as institution name, model
            name, PI name, whatever. Fundamental CF1.0 coordinate
            concepts must be altered; machinery has to be extended and
            made more complex in order to support these coordinate mappings.
   4. In realistic data management scenarios the ensemble axis will not
      be a sufficient solution to the problem; "aggregation servers"
      will be needed as well. (And when aggregation servers are
      introduced into the problem space, there are alternative
      approaches that should be considered, too.)
          * Ensemble members will often be created (at various
            institutions) without awareness of one another. The
            individual model outputs will be contained in separate files
            that must be aggregated to form an axis.
          * The size of multi-model ensemble files would be unwieldy (an
            understatement!).
          * Ensemble members will not all become available at the same
            time. They will become available as model runs at different
            institutions are completed.
          * Ensemble members may be added or removed based upon "social"
            considerations (who is participating in the multi-model
            project).
          * The attributes that identify each model will be (presumably)
            global attributes of that model run. A new generation of
            aggregation capabilities will be needed in order to
            "promote" these global attributes into arrays on the
            ensemble axis dimension. Likely further levels
            standardization of global attributes beyond the CF1.0
            specifications will be needed in order to ensure that
            meaningful promotions are possible.
   5. If implemented the proposal will create significant barriers to
      interoperability.
          * CF1.0 has created the highest level of model-sharing
            interoperability that our community has ever seen.
            Interoperability is arguably the greatest contribution that
            CF has made. (It is for this reason, for example, that ESRI
            products have begun to support CF.) Many clients that are
            currently capable of reading CF 1.0 will not be able to
            access model outputs that utilize this proposal. The scope
            of this problem -- weighing the benefits against the losses
            -- deserves to be discussed and assessed.
   6. The proposal potentially compromises the future quality of CF
      because netCDF 4 will offer solutions that model the problem properly.
          * The "groups" concept of NetCDF4 resolves most of the
            problems identified above. It is arguably a significantly
            better technical solution. If the netCDF3 ensemble axis
            proposal is accepted, then when netCDF4 becomes available,
            CF will have to decide to accept one of the following -- all
            of which are undesirable outcomes:
                o not to utilize the new and superior options that
                  netCDF4 offers for encoding ensembles:
                  or
                o to utilize the superior options that netCDF4 offers
                  and accept that CF defines redundant, complex
                  solutions to the ensembles problem
                  or
                o to abandon the proposal that is based upon netCDF3 and
                  thereby orphan ensemble files and aggregations that
                  have been created.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20070222/8bdbb998/attachment-0002.html>
Received on Thu Feb 22 2007 - 22:32:42 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒