[CF-metadata] CF standard names : request for statement of the issues from Jonathan Gregory on 2008-11-30 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Jonathan Gregory <j.m.gregory>
Date: Sun, 30 Nov 2008 11:01:34 +0000

Dear John

> We'd like to come up with a clear statement of what standard names are (or
> should be), and what are the problems and issues that we should be focusing
> on next.

Thanks for this posting. We've had several discussions about what standard
names are for and how they are constructed, and I've found those discussions
helpful to clarify ideas. This is what I currently think (partly repeating
bits of recent postings):

* Standard names are not really "names". They are very brief definitions of
the quantities concerned, answering the question, "What does that mean?".
Therefore they are often longer than the terms used in scientific literature.

* Standard names are an important element of the purpose of CF, to "define
metadata that provide a definitive description of what the data in each
variable represents ... This enables users of data from different sources to
decide which quantities are comparable". Therefore standard names distinguish
quantities which need to be distinguished, but they can also be deliberately
vague when quantities from a different data source should share a standard
name because they are regarded as comparable. Standard names thus have various
degrees of precision, and the choice of which set of standard names to use
depends on the application.

* Like CF in general, the standard name table was initially intended for
climate and forecast model output, for describing properties of the simulated
world. The same standard names obviously apply for the same quantities
measured or inferred in the real world. However for measurements we also have
to describe (a) "raw" data, which comes from instruments and is used to produce
data about the real world, and (b) properties of the measurement system. We
have added some standard names for these purposes, but we may need a clearer
policy for doing it.

* It takes time and effort to devise new standard names. Proposals which are
analogous to existing names can often be agreed quickly. The hard work comes
in deciding how to describe new concepts in a way which is clear and
consistent with existing names. This work requires scientific understanding of
the concepts being described, and thus depends on relevant expertise. In order
to make this go faster, it might help to have better tools for analysing the
existing names.

* We attempt to construct standard names systematically, using words and
phrases with consistent meanings and in a consistent order. This is to avoid
implying illusory distinctions, and to reduce mistakes which would be made if
names differed unexpectedly. Some of the rules are written down in the
guidelines, but these are not comprehensive. It would probably help speed up
development if we did state all the rules explicitly. That would make it more
obvious when a new proposal is like existing ones, and when we have to decide
on new patterns or vocabulary.

* The guidelines are not followed in all cases, because for some standard
names we have adopted familiar but unsystematic terms. Also, there is often
more than one possible systematic description of a quantity, but obviously
only one can be chosen for the table.

* We try to use familiar words and phrases when choosing standard names, but
it is more important for them to be self-explanatory and to avoid jargon, the
target audience being any scientific user of the data. The names should at
least indicate to any such user which general area they refer to.

* Quantities which have different physical dimensions (different SI units) are
always regarded as distinct, and must have different standard names. Units
must be consistent with the standard name; we do not use units to distinguish
between quantities.

* Standard names do not provide metadata which could have infinitely many
possible values. In particular, spatiotemporal coordinates and numerical
parameters are specified by coordinate variables, not as part of the standard
name. That means there is not a standard name for 2 m air temperature, for
example, since CF regards "2 m" as a coordinate. However, surfaces which are
identified by a physical description rather than a parameter value (e.g. toa)
are included in standard names, because there is only a small set of
possibilities.

* We could use string-valued coordinates for parts of standard names that
could be regarded as parameters with a discrete set of values, like chemical
species. We haven't decided to do that yet, but it's a possibility. In that
approach any combination of parameter and standard name would be allowed,
whereas when the parameter is part of the standard name (as is the case at
present with chemical species) the legal combinations are defined explicitly
by the standard name table. The latter makes more work in constructing the
standard name table, but avoids nonsensical metadata.

* Not all the descriptive part of the metadata is included in the standard
name. Other attributes are also important, such as cell_methods. A separate
attribute is useful to contain metadata that is relevant for a wide range of
quantities, because in that case "factorising" it out of the standard name
leads to a large reduction in the size of the standard name table.

* Common concepts have been proposed as a way to identify particular
combinations of standard names with other metadata. They would complement
standard names, other attributes and coordinates.

Best wishes

Jonathan
Received on Sun Nov 30 2008 - 04:01:34 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST