⇐ ⇒

[CF-metadata] Getting back to ensembles

From: Jonathan Gregory <j.m.gregory>
Date: Thu, 23 Nov 2006 00:03:27 +0000

Dear Bryan, John, Paco, Brian, et al.

I think we are all happy with the idea of a realization dimension and
auxiliary string coordinate variables (labels) with this dimension that
contain the various metadata.

John said that he would require one of these variables to have unique values
(no repeats) so it could be used for selection. Brian pointed out that the
current CF (6) provision for label coordinates doesn't have such a restriction.
However, unique values are required for coordinate variables for the same
reason that John wants them in this case. If we could regard a realization
variable as a coordinate variable rather than as alternative label information,
the uniqueness requirement would be natural. In netCDF string-valued
arrays have to be 2D character variables, because there isn't support for
true arrays of strings, but this is a restriction of the file format, and
netCDF-4 does have strings (I believe). In netCDF-4 it would be natural, I
think, to have a 1D array of strings whose name was the same as the name of
its dimension e.g. realization(realization) i.e. a coordinate variable. We
can't do this now: it has to be realization(realization,stringlength) and we
even recommend against having a multidimensional array whose name matches the
name of a dimension, so that it is not misidentified as a coordinate variable.
But for strings it isn't "logically" multidimensional. So I would propose:

* If a 2D auxiliary coordinate variable is of char type and its name is the
same as the name of its first dimension, it should be regarded as a string-
valued coordinate variable, and its string values should all be distinct.
Coordinate variables are also required to be monotonic. Do you want to make
this requirement too for string-valued ones? I can't see the need for it
myself, because you don't perform operations on string-valued variables or
use them for plotting positions, like you do with numeric ones.

* If it has any other name, it should be regarded as a label/alternative
coordinate variable as currently in CF 6, and the restriction does not apply.

In either case the nature of the metadata needs to be identified. Bryan says
he "strongly disagrees" with my suggestion. I'm not sure which one of them he
dislikes :-) but it might be my proposal that new standard names for these
things (such as institution) should be the same as the new or existing names
for global and data variable attributes. It seems sensible to me to make them
the same names, so probably the objectionable bit is their designation as
standard names.

Bryan proposes that instead of standard names they should be some new
attribute, because standard names are for physical quantities and we should
know what is physical. John may also see an advantage in this. I can see there
is a sort of difference, but I don't really think it's clear-cut. We already
have standard names for some things that aren't really physical quantities in
Bryan's sense e.g. land_cover and region (string-valued), and various platform_
names (describing the observation platform rather than any quantity which is
observed). I suspect there are grey areas. Why should we make a functional
distinction, for instance, between the ensemble member number and the analysis
time for various forecasts made for the same verification time? I think Bryan
would put these in different categories. Moreover, the standard_name and the
new attribute will have to be searched for in the same kind of way, and the
values of the coordinates they label have to be treated in the same ways
(selection, labelling axes, etc.); it complicates the software to have to
do this with two different attributes and two different tables to look them up
in rather than one. It seems simpler to me to make them standard names. If
there is a need for a distinction, we could make it with some additional
attribute. What different treatments are needed that should be flagged?

I believe we agree that it's useful to have an attribute that points to an
external dictionary. This could be useful for string-valued existing standard-
named quantities such as region. I agree that it would be good if these tables
weren't maintained by us, but I don't think we can do that (as I've said
before) without an agreement with the dictionary-maintainer about the format
and content of the dictionary. Do others think the same?

Best wishes

Jonathan
Received on Wed Nov 22 2006 - 17:03:27 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒