[CF-metadata] some concerns about the "ensemble axis" proposal from Steve Hankin on 2007-03-06 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Steve Hankin <Steven.C.Hankin>
Date: Tue, 06 Mar 2007 09:30:29 -0800

Hi All,

I think that we have all agreed that there is a lot of room for
improvement in our discussion process. I doubt that any of us would
accept the depth of reasoning and testing that passes for "consensus" in
our CF discussions as an adequate level of analysis to make strategic
decisions in the projects that we manage ourselves. There is an
in-built problem in email-based, consensus-driven processes;
individuals strongly motivated by a particular new feature will be
vocal, whereas those not concerned with the feature will tune out. The
result is a bias towards "yes" votes. Considerations of the stability
and broad interoperability of the standard generally get short shrift
when measured against enthusiasm for new features. It is why we need to
constantly remind ourselves of the quotation that Russ Rew brought to
the table:

    /To create quality software [or standards], the ability to say "no"
    is usually far more important than the ability to say "yes."/

There are at least three voices expressing reservations on this issue.
There are perhaps 5 expressing go-ahead. Given the in-built biases of
the process, does this tally really represent a consensus?

Here are two questions I would like to see discussed and answered:

   1. What *is* the solution for handling multi-model grids?
      If multi-grid ensembles is a problem that we have to solve, then
      we should understand our approach before agreeing on the solution
      to the simpler sub-case of uniform grid ensembles.
   2. IMHO there is a clear and compelling need for higher order
      metadata to organize collections of CF files. Our community is
      seeing very wide use of CF in combination with THREDDS catalogs.
      Bryan has expressed reservations about the use of THREDDS
      catalogs. I do not share these reservations. In fact I think the
      current discussion illustrates the need for tools at the THREDDS
      level. What are the underlying issues? How do others feel?

Doubtless there are more questions in need of a thorough airing before
we come to a final conclusion. Shouldn't our CF discussion process
require us to make an explicit enumeration of such concerns and address
them?

    - Steve

P.S. For the record, I am *not* implacably opposed to the addition of an
ensemble axis to CF. I agree that it has significant merits.

============================================

Bryan Lawrence wrote:
> I agree with Jonathan's summary. I made my points about Thredds and
> netcdf in an email on the 1st of November, I don't think they help us
> here and now, and we have to get on and do things.
>
> Bryan
>
>
> On Sun, 2007-03-04 at 16:35 +0000, Jonathan Gregory wrote:
>
>> Dear all
>>
>> Many people have made thoughtful contributions to this discussion. My
>> impression is that through the earlier discussions we reached a consensus on
>> a convention for handling ensembles. There was one exception to this, namely
>> the issue of whether to use the standard_name attribute for new purposes, or
>> define an alternative attribute, but that issue doesn't affect the structure
>> proposed. This convention introduces an extra dimension and allow a combination
>> of auxiliary coordinate variables to identify the ensemble members along that
>> dimension. The convention is adequate for some needs that already exist. That
>> is why Paco raised the requirement in the first place. Balaji gives two use-
>> cases for it. The combination of auxiliary coordinates is really doing the job
>> of the "lower-level" identification (members within the ensemble) that Balaji
>> describes, but early on in the previous discussion it appeared useful to have
>> this factorised into various attributes rather than in a single string.
>>
>> I agree that because this axis doesn't have a meaningful monotonic coordinate
>> variable, you can't extract a range from it. That's because it's discrete, not
>> continuous. This is not an unprecedented situation. If you have a data variable
>> containing timeseries or vertical profiles at scattered points, or trajectories
>> from a number of parcels, you will have a similar kind of index dimension, with
>> auxiliary coordinates providing locations and other identification. You can
>> usefully subset such an axis, as people have commented. Subsetting may be a bit
>> more awkward for analysis softare, but it is an essential operation. You might
>> do it with continuous axes as well (e.g. to extract the Januaries from a time
>> axis of months).
>>
>> It is also a limitation that the data have to be on the same space-time grid.
>> For many analysis operations, however, this is necessary, even if the data were
>> not generated like that. In Balaji's case of the AR4 archive, the models do not
>> have the same grid. It is necessary to put them on a common grid nonetheless in
>> order to produce some familiar diagnostics such as appear in the AR4 and recent
>> papers, like the average lat-lon field of surface temperature change, or the
>> time-depth field of ocean temperature change averaged over models with
>> volcanoes. Even if the archive did not originally hold that data like that, I
>> am sure that somewhere along the way the ensemble dimension will be needed.
>>
>> In regard to John's point, it's true that models don't generate all the same
>> quantities. But one purpose of standard names is to indicate which quantities
>> can be regarded as equivalent among models. You would put surface temperature
>> from various models in a single data variable with an ensemble dimension if
>> the data variables from the various models already had the same standard_name.
>>
>> It is certainly true that it would be good to have a solution that *could*
>> deal with data on different grids. We can't do that with the proposal we have
>> now. It may well be that netCDF-4 will offer a good technical solution for
>> that. However, I don't think we ought to wait for it. We have a good solution
>> now for cases which are of immediate practical relevance. If we find later that
>> a more general solution is efficient for these cases too, we can support both.
>> That would be fine. CF is always going to evolve for new needs and technology.
>>
>> Best wishes
>>
>> Jonathan
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
>

-- 
--
Steve Hankin, NOAA/PMEL -- Steven.C.Hankin at noaa.gov
7600 Sand Point Way NE, Seattle, WA 98115-0070
ph. (206) 526-6080, FAX (206) 526-6744
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20070306/a292165c/attachment-0002.html>

Received on Tue Mar 06 2007 - 10:30:29 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST