Hi all,
While at a meeting together recently Balaji and I had a chance to
discuss CF issues. We realized that we shared a number of concerns
about the ensemble axis proposal that has been under discussion on the
CF email list. We have broadly agreed on the need to make CF
discussions thorough and systematic, so while it is a shame to slow
things down :-( , Balaji and I hope there will be general support for
the appropriateness of airing these concerns. When the CF discussion
Wiki gets into full service soon, we can migrate this topic to that
venue and hopefully create greater clarity while doing it.
Below the concerns are raised in 2 parts
1. briefly enumerated
2. discussed in greater depth
- Steve & Balaji
====================
Concerns with the proposal for a CF "ensemble axis":
1. The ensemble axis proposal does not solve the general multi-model
ensemble problem.
2. A netCDF-style ensemble axis is a marginal model for the
underlying problem.
3. In order to accommodate the ensemble axis concept other basic
concepts of the CF data model must be extended and made more
complicated.
4. In realistic data management scenarios the ensemble axis will not
be a sufficient solution to the problem; "aggregation servers"
will be needed as well. (And when aggregation servers are
introduced into the problem space, there are alternative
approaches that should be considered, too.)
5. If implemented the proposal will create significant barriers to
interoperability.
6. The proposal potentially compromises the future quality of CF
because netCDF 4 will offer solutions that model the problem properly.
Discussion of the concerns in greater detail (same numbering):
1. The ensemble axis proposal does not solve the general multi-model
ensemble problem.
* In earlier discussions it was stated that the ensemble axis
is intended to address multi-model ensembles. But the model
runs in a multi-model ensemble will in general be on a
number of differing grids. The proposal addresses only the
limited sub-case in which all models have been re-gridded to
the same grid. This leaves unanswered how CF can support
ensembles on multiple grids. We should explore the answer
to the general question before committing to the specialized
solution.
2. A netCDF-style ensemble axis is a marginal model for the
underlying problem.
* The "ensemble axis" is not an ordered axis. So when clients
are working with models from an ensemble they will often not
be accessing contiguous ranges of indices on the axis.
NetCDF dimensions are ordered and can only provide direct
API support for contiguous ranges on a dimension. So the
ensemble axis proposal will not provide the usual and
expected benefits of a netCDF dimension.
3. In order to accommodate the ensemble axis concept other basic
concepts of the CF data model must be extended and made more
complicated.
* For multi-model ensembles the members of the proposed
ensemble axis must be identified by some kind of unique
coordinate value. Since there is no associated ordered
(numerical) coordinate, the ensemble members must be
identified by metadata (typically strings) from the
individual model runs (i.e. global attributes). In fact, it
will often be the case that there is no single global
attribute that uniquely identifies a model. Rather it will
be a combination of metadata such as institution name, model
name, PI name, whatever. Fundamental CF1.0 coordinate
concepts must be altered; machinery has to be extended and
made more complex in order to support these coordinate mappings.
4. In realistic data management scenarios the ensemble axis will not
be a sufficient solution to the problem; "aggregation servers"
will be needed as well. (And when aggregation servers are
introduced into the problem space, there are alternative
approaches that should be considered, too.)
* Ensemble members will often be created (at various
institutions) without awareness of one another. The
individual model outputs will be contained in separate files
that must be aggregated to form an axis.
* The size of multi-model ensemble files would be unwieldy (an
understatement!).
* Ensemble members will not all become available at the same
time. They will become available as model runs at different
institutions are completed.
* Ensemble members may be added or removed based upon "social"
considerations (who is participating in the multi-model
project).
* The attributes that identify each model will be (presumably)
global attributes of that model run. A new generation of
aggregation capabilities will be needed in order to
"promote" these global attributes into arrays on the
ensemble axis dimension. Likely further levels
standardization of global attributes beyond the CF1.0
specifications will be needed in order to ensure that
meaningful promotions are possible.
5. If implemented the proposal will create significant barriers to
interoperability.
* CF1.0 has created the highest level of model-sharing
interoperability that our community has ever seen.
Interoperability is arguably the greatest contribution that
CF has made. (It is for this reason, for example, that ESRI
products have begun to support CF.) Many clients that are
currently capable of reading CF 1.0 will not be able to
access model outputs that utilize this proposal. The scope
of this problem -- weighing the benefits against the losses
-- deserves to be discussed and assessed.
6. The proposal potentially compromises the future quality of CF
because netCDF 4 will offer solutions that model the problem properly.
* The "groups" concept of NetCDF4 resolves most of the
problems identified above. It is arguably a significantly
better technical solution. If the netCDF3 ensemble axis
proposal is accepted, then when netCDF4 becomes available,
CF will have to decide to accept one of the following -- all
of which are undesirable outcomes:
o not to utilize the new and superior options that
netCDF4 offers for encoding ensembles:
or
o to utilize the superior options that netCDF4 offers
and accept that CF defines redundant, complex
solutions to the ensembles problem
or
o to abandon the proposal that is based upon netCDF3 and
thereby orphan ensemble files and aggregations that
have been created.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20070222/8bdbb998/attachment-0002.html>
Received on Thu Feb 22 2007 - 22:32:42 GMT