⇐ ⇒

[CF-metadata] some concerns about the "ensemble axis" proposal

From: Jonathan Gregory <j.m.gregory>
Date: Sun, 4 Mar 2007 16:35:19 +0000

Dear all

Many people have made thoughtful contributions to this discussion. My
impression is that through the earlier discussions we reached a consensus on
a convention for handling ensembles. There was one exception to this, namely
the issue of whether to use the standard_name attribute for new purposes, or
define an alternative attribute, but that issue doesn't affect the structure
proposed. This convention introduces an extra dimension and allow a combination
of auxiliary coordinate variables to identify the ensemble members along that
dimension. The convention is adequate for some needs that already exist. That
is why Paco raised the requirement in the first place. Balaji gives two use-
cases for it. The combination of auxiliary coordinates is really doing the job
of the "lower-level" identification (members within the ensemble) that Balaji
describes, but early on in the previous discussion it appeared useful to have
this factorised into various attributes rather than in a single string.

I agree that because this axis doesn't have a meaningful monotonic coordinate
variable, you can't extract a range from it. That's because it's discrete, not
continuous. This is not an unprecedented situation. If you have a data variable
containing timeseries or vertical profiles at scattered points, or trajectories
from a number of parcels, you will have a similar kind of index dimension, with
auxiliary coordinates providing locations and other identification. You can
usefully subset such an axis, as people have commented. Subsetting may be a bit
more awkward for analysis softare, but it is an essential operation. You might
do it with continuous axes as well (e.g. to extract the Januaries from a time
axis of months).

It is also a limitation that the data have to be on the same space-time grid.
For many analysis operations, however, this is necessary, even if the data were
not generated like that. In Balaji's case of the AR4 archive, the models do not
have the same grid. It is necessary to put them on a common grid nonetheless in
order to produce some familiar diagnostics such as appear in the AR4 and recent
papers, like the average lat-lon field of surface temperature change, or the
time-depth field of ocean temperature change averaged over models with
volcanoes. Even if the archive did not originally hold that data like that, I
am sure that somewhere along the way the ensemble dimension will be needed.

In regard to John's point, it's true that models don't generate all the same
quantities. But one purpose of standard names is to indicate which quantities
can be regarded as equivalent among models. You would put surface temperature
from various models in a single data variable with an ensemble dimension if
the data variables from the various models already had the same standard_name.

It is certainly true that it would be good to have a solution that *could*
deal with data on different grids. We can't do that with the proposal we have
now. It may well be that netCDF-4 will offer a good technical solution for
that. However, I don't think we ought to wait for it. We have a good solution
now for cases which are of immediate practical relevance. If we find later that
a more general solution is efficient for these cases too, we can support both.
That would be fine. CF is always going to evolve for new needs and technology.

Best wishes

Jonathan
Received on Sun Mar 04 2007 - 09:35:19 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒