[CF-metadata] CF and multi-forecast system ensemble data from Jonathan Gregory on 2006-11-11 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Jonathan Gregory <j.m.gregory>
Date: Sat, 11 Nov 2006 13:10:05 +0000

Dear All

As I was away for a week, I've just read the last two weeks of emails on this
thread and would offer the following comments:

* Do we agree to define some new global attributes, which could also be
attributes of data variables in the case where aggregation is not done? I
am not sure we have finalised the requirements for these.

* I agree with the several arguments that the simple solution for aggregation
of adding a realization dimension is the best for the moment. It might
be that the possibility of aggregating groups of heterogenous objects in
netCDF-4 will turn out to be useful, but we haven't currently a requirement for
that flexibility, it appears. When someone does, we can define a convention.
CF already explicitly permits going beyond 4D, and I hope software will be
adapted to deal with that, as it appears that GrADS is.

* I wonder if there is sometimes a confusion between standard names and the
contents of variables. Defining a new standard name is not the same as
standardising the values that a variable can take which has this name. Most
standard names, of course, are for continuous physical variables whose contents
need no standardisation. Standardisation may be needed for variables which take
one of a discrete set of values. flag_values and flag_meanings (CF 3.5) are a
mechanism to do this which allows the file to be self-describing. Otherwise,
we have to use external tables. So far we have only done this for names of
regions, but we have some other standard names for quantities which might need
to be standardised, such as land cover. A realization dimension to provide
information on the provenance of data could fall in this category too, as
discussed.

* Re Bryan's example

> float temperature(realization,time,lat,lon):
> temperature:coordinates = 'realization time lat lon' ;
> temperature:ancillary_variables = 'metadata1 metadata2 metadata3 '
>
> # Note also I'm using ancillary variables not coordinate variables, it's
> # only the realization that we're using in this way
>
> char metadata1(realization,len100):
> metadata1:external_vocabulary = http://wmo.foo.int/identifierY

I raised this before, but I still don't think these should be ancillary
variables. Ancillary variables provide metadata for each value of a data
variables, such as its quality flag or uncertainty. I think the metadata
variables, which have the realization dimension, are auxiliary coordinate
variables and should be included in the coordinates attribute, not in an
ancillary_variables attribute. They are auxiliary coordinate variables of the
kind described by CF 6 on labels etc. They should each be identified by having
their standard name. So I would put

  float temperature(realization,time,lat,lon):
    temperature:coordinates = 'time lat lon metadata1' ;
  char metadata1(realization,len100):
    metadata1:standard_name="institution"; // for instance
    metadata1:external_vocabulary = http://wmo.foo.int/identifierY

I proposed that the new standard names would be the same as the new or
existing global and local attributes that we use for these purposes
(such as institution).

* We have informally developed (but not included in the standards document) a
convention for handling forecast/analysis time
http://www.cgd.ucar.edu/pipermail/cf-metadata/2006/001008.html
If there is only a need for one time axis (only forecast or only analysis) it
can be a 1D time coordinate as usual. If there are two kinds of time, it is
recommended to use a 1D index axis, with auxiliary coordinate variables to
specify analysis time and forecast time or period. This latter arrangement is
flexible because it allows gaps in the 2D matrix. NetCDF-4 could support it
with ragged arrays, which I presume are internally mapped onto a single
dimension just like this. In a recent posting, Steve said he and Bryan also
felt that 2D time was not needed. That is consistent with earlier discussion.

* In a sense, everything is provisional, in that we could always change it
later (as we can do with aliases for standard names). However, we are always
facing the problem that people want a convention to use straight away. Often
they immediately use it to produce lots of data, or they even produce the
data before the convention is finally agreed, since our discussions are slow.
It does not really help in practice to say that the convention is provisional.
If a mountain of useful data exists, changing the convention presents a
problem of backward compatibility. That's not to say that we can't do it, but
I conclude that what we have to do is think as carefully as possible (and as
fast as possible) so that we decide wisely, and minimise the possibility of
having to change afterwards. We should not have the approach that our decisions
are prototypes so it is not so important to get it right, because prototypes
have a habit of remaining in use, and when it is archived data that is
concerned, not just software, this is even truer. A corollary, in my view,
is that we must limit our scope to what is needed. We should certainly have an
eye out for obvious generalisation that is likely to be required soon, but we
need not try too hard to make predictions for future use.

* It would be useful if GrADS could support auxiliary coordinates and bounds,
since these are generally valuable features - especially bounds, which I find
practically essential. However, I understand that effort is limited!

Best wishes

Jonathan
Received on Sat Nov 11 2006 - 06:10:05 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST