[CF-metadata] a different (but perhaps unoriginal) approach to standard name construction from Jonathan Gregory on 2008-11-03 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Jonathan Gregory <j.m.gregory>
Date: Mon, 3 Nov 2008 13:16:46 +0000

Dear Karl et al.

Standard names are certainly a difficult business and it's a good idea to
discuss how we should be dealing with them. They are much more than names, as
Julia Collins remarked.

In your email, Karl, I am unclear whether you are proposing to replace the
single standard_name attribute with many attributes, or to construct the
single attribute by a more systematic procedure. I think there are several
arguments in favour of having a single attribute:

* There are several possible optional qualifying bits of information
(at_SURFACE, due_to_PROCESS, assuming_CONDITION, etc.). With a single
attribute, you know at once what is specified. With separate attributes, you
would need a separate way to find out what attributes might possibly be
specified, in order to check whether they were there.

* It makes sure that essential definitive information is included, such as the
sign-convention. Separate attributes can be accidentally omitted.

* It is more convenient in a program to examine a single attribute to find
what you want. If the information is in several attributes, you need to have
several if-conditions ANDed together. (Of course, sometimes this is necessary
anyway if there is a condition on coordinates or cell_methods too, but we
don't want to make things unnecessarily difficult.)

* The order of the bits of information may be significant e.g. the two
components of a tensor, or the order of transformations. If they are in
separate attributes, the ordering would be more awkward to record.

In favour of many attributes, you suggest that you might want to contain the
data for quantities which currently have various standard names in a single
data array e.g. concentrations of different chemical species, contributions of
various processes. This might be desirable, but I'm not convinced. This is
being discussed in the chemistry thread, where it has been remarked that
although chemical models *do* have dimensions over chemical species, it is not
essential to write out the data in that way. But it could be done, by making
the chemical species a coordinate dimension, as we have been discussing. It
could be done in general by making the standard_name a coordinate variable,
instead of an attribute, if there is a strong reason for doing it.

Of course, I am not arguing entirely against having more than one attribute.
We do have cell_methods and standard_name qualifiers as well as basic standard
names and coordinates, for instance. I think there's a good reason to separate
out qualifiers which are always or usually relevant.

Supposing we continue with a single attribute, the issue is whether we can
construct it with less effort. You say yourself that this is perhaps the more
important objective. I agree that if we had a system for assembling new
standard names from existing components, it would be useful. It would help
people find out whether there was in fact already a name for something, and it
would make sure we put things in the same order and used the same phrases
wherever relevant. I believe there is actually more system to the existing
names than the Guidelines indicate, and an automatic system would make this
apparent.

However, the cases which are extensions of existing patterns are already the
easy ones. They are not the cases which take most of the time and effort to
deal with, I think. I've said this before - do you think I'm mistaken in this
perception? Consider recent examples:

* The long debate about extreme statistics was mostly about how the metadata
should be organised among standard name, coordinate variable and cell methods,
not about the choice of standard name per se.

* The thread about "date and time" is so far more about what we want to
distinguish than how to do it.

* I listed in another thread some questions that Stephen Griffies and I have
been discussing for ocean quantities for CMIP5. These are the kind of
decisions that took most time, not actually stringing together a name:
- Basin masks for tracer and velocity are the same geophysical quantity, but
distinguished by coordinates.
- What does "ideal age" of sea water mean?
- Is the mixed-layer depth determined by a buoyancy criterion the same concept
as mixed-layer depth determined by sigma-theta?
- Transports across various straits are all the same geophysical quantity, and
the strait should be identified by some string-valued coordinate.
- How do we most usefully categorise the various kinds of ocean mixing in a
way which will be helpful for comparing models?
- What is the clearest way to describe the energetics of vertical mixing: is
it the rate of work against stratification, or the rate of change of potential
energy?

While a better description of what we are doing would clarify the existence of
difficult cases and help us think about them, I don't think would reduce the
hard work of deciding on the new distinctions, elements and constructions.

However, I think it would still be valuable to follow this up. Because of the
issue of ordering, and the large number of qualifiers, I think Robert
Muetzelfeldt's description of the problem by using a grammar is more
appropriate than using a number of independent attributes.

Finally, I wonder whether you could say more about what you mean by "this
standard_name business seems a bit out of control"?

Best wishes

Jonathan
Received on Mon Nov 03 2008 - 06:16:46 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST