⇐ ⇒

[CF-metadata] standard names for defined global attribute names

From: Jonathan Gregory <j.m.gregory>
Date: Sun, 1 Apr 2012 08:02:34 +0100

Dear Nan

I agree that the use of a string-valued aux coord var for "source" or
"institution" is not the only way to store this kind of information. However,
I think there is some logic to this extension. (Although conventions are
arbitrary, we prefer them to be logical!) I would make two arguments for it:

* It is quite similar to the use of string-valued aux coord vars for "region"
and "area_type", which are also allowed by CF; both of these are string-valued
and have standardised values. region is commonly used, for instance, when
storing the ocean meridional overturning streamfunction in a data variable
(basin,depth,latitude). The basin dimension is an index dimension which
doesn't have a (Unidata) numeric coordinate variable; instead it has a
string-valued aux coord var (following CF sect 6.1) providing labels such as
atlantic_ocean. area_type is used, for instance, to label a dimension that
runs over land surface types e.g. bare soil, grass, needleleaf trees. Both
of these kind of dimension are concerned with location in horizontal space,
so they are like geographical coordinates, although not numeric.

* The commonest need for source and institution is to label a dimension that
runs over members of an ensemble. This is a frequent need in analysis of model
data, especially from intercomparisons. (If I remember correctly, I think that
the use of source and institution in this way came from this need for it.) The
ensemble dimension can be treated just like other dimensions, such as time,
lat and lon, when calculating statistics and other data reductions. Therefore
it seems natural for it to have coordinates. However, if they are string-
valued, they have to be aux coord vars. If the members of the ensemble just
had numbers, we could have a (Unidata) numeric coord var, and we could assign
a standard_name to it. This would then seem even more like a spatial coord
var, such as one with standard_name of model_level_number.

It seems to me that these examples suggest a "formal" motivation for this use
of aux coord vars, namely that the metadata concerned in these cases (source,
institution, area_type, region, ensemble member number) is multi-valued, and
its dimension is one of the dimensions of the data variable. It's therefore
natural to see it like a 1D coordinate variable. When these strings are all
different, and if netCDF classic had a string data type, you could actually
store them in a 1D string-valued coord var.

> Do all the possible instance
> variables (station, profile...)
> need to have standard names, too?

In chapter 9, we decided to use cf_role to indicate an instance variable that
labels the timeseries, profile, etc. I think this is mainly because we do not
mandate what *kind* of label this is. It might be a name, but it might numeric.
For instance, if we had decided to define a standard_name of
wmo_station_number, that number could provide the station identification, and
the instance variable would have both a cf_role and a standard_name. The
cf_role indicates generically the structural function of the variable, and the
standard_name indicates specifically what the values of it mean.

I think the main inconsistency of using standard_names for things like this is
that they are not really geophysical variables, which is what we mostly define
standard names for. We have to beware of this.
 
> If I understand what you're conveying with this CDL, 'institution'
> it looks really similar to
> the NODC 'instrument' attribute.

I agree that a variable attribute is a logical way to record a label which
applies to the whole data variable. It is hard to draw a distinction between
a string-valued attribute and a scalar string-valued aux coord var. If we
allow source as a standard_name, the same information could be stored like
this:

variables:
  float temperature(lat,lon);
    temperature:source="Hadley Centre";

or like this:

variables:
  float temperature(lat,lon);
    temperature:coordinates="label";
  char label(stringlength);
    label:standard_name="source";
data:
  source="Hadley Centre";

and the first is probably better. However, the second is just a special case
of a *multi-valued* source.

How can we decide between using a variable or using an attribute? I think that
we use variables when they might depend on data dimensions, or when they
themselves need attributes. The former is the reason in this case. Both are
general reasons why CF uses scalar or size-1 coord vars. For example, it's
better to store "height of 1.5 m" in a scalar coordinate variable than to
define a special attribute for "height", because it needs a unit (m) and a
standard_name (height) to describe it, and these are just the same as you
would use for a multi-valued height coord var.

The case where I think an ancillary_variable is most appropriate is when the
quantity being provided is metadata for *individual elements* of the data
variable. This mechanism was introduced to store things like standard errors,
but in another thread (Martin Boettcher's), there is discussion of providing
a string-valued aux coord var to specify the source for each *element* of the
data variable.

Best wishes

Jonathan
Received on Sun Apr 01 2012 - 01:02:34 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒