⇐ ⇒

[CF-metadata] scalar coordinates

From: Jonathan Gregory <j.m.gregory>
Date: Fri, 10 May 2013 17:26:29 +0100

Dear Mark et al.

Others may not be aware we have been discussing this for a long time in ticket
95, about the CF data model. I think it's important to be clear that this
discussion is concerned with how CF-netCDF files are interpreted, in an
abstract sense. That has implications for the design of software which is
sympathetic to CF, but it doesn't affect the compliance of CF-netCDF files.

I think that the statement in sect 5.7 is the definitive one, which you quoted.
"When a variable has an associated coordinate which is single-valued, that
coordinate may be represented as a scalar variable. ... The new scalar
coordinate variable is a convenience feature which avoids adding size one
dimensions to variables. Scalar coordinate variables have the same information
content and can be used in the same contexts as a size one coordinate
variable." I think that makes it clear that a scalar coordinate variable is
regarded as logically equivalent to a size-one (Unidata) coordinate variable.
I'm certain that is what we had in mind when we wrote that text. If that is
what Mark's option B means, then I vote for option B. But I find option B a
bit unclear in saying a scalar coordinate variable "... explicitly declares a
dimension of size one". The point is, it does not *declare* the dimension. As
the rest of the sentence says, "the dimension is not stated in the file".

We intended it for just the kind of case that Seth describes e.g. air
temperature at 2 m height. It's convenient to record this kind of thing as a
scalar coordinate variable, because it means you don't have clutter up the
file and the data variable declaration with size-one dimensions. It's useful
that it is a coordinate variable, because that means it can have a
standard_name, units and bounds. These are equivalent in information content:

  float height; // scalar coordinate variable
    height: standard_name="height";
  float temp(lat,lon);
    temp: standard_name="air_temperature";
    temp: coordinates="height";

  float height(height); // size-one coordinate var, with dimension height=1
    height: standard_name="height";
  float temp(height,lat,lon);
    temp: standard_name="air_temperature";

So I think there is an implied dimension, which we're not putting in the file
because we don't need it. Although if this is an observational dataset, there
may only ever be one height, as Seth says, if it was a slice of one level out
of a model dataset, it's natural to think of it as having a size-one dimension
which is omitted from the file. If you had one-level slices in different data
variables, you might want to join them together and reconstruct the height
dimension. It doesn't make any difference to the file; it's just how you think
about it.

In the example that Richard gives, there are choices which the writer of the
file could record concerning the time, forecast_reference_time and
forecast_period. Leaving out of consideration the cases where both
forecast_period and forecast_reference_time are multi-valued (the document
Steve pointed to is about that):

(a) If there are many forecast_periods from one forecast_reference_time, then
forecast_reference_time could be a size-one or scalar coord var, and either
time or forecast_period could be a (Unidata) coord var, with the other being
an aux coord var of the same dimension.

(b) If there are forecasts at the same forecast_period with various
forecast_reference_times, then forecast_period could be a size-one or scalar
coord var, and either time or forecast_reference_time a (Unidata) coord var,
with the other being an aux coord var of the same dimension.

In these cases, it's well-defined how the three variables are grouped.
Richard is interested in the degenerate case when none of them is
multi-valued. That case could be written as (a) or (b). In practice, it may
naturally be (a) or (b), because it's a single time-slice from a system that
produces one or the other. If that's so, it's useful for the data-writer to
indicate it, I would say. In order to record that two of the size-one
coordinates are related, you must have an explicit dimension of size one,
with one of them a (Unidata) coord var and the other an auxiliary. But, as
Richard says, in this special case you could make all three of them coord
vars or scalar coord vars. I think that would be less informative, so it
doesn't seem better to me. But it's allowed, of course. The data-writer can
make a choice. The user of the data can aggregate it however he/she pleases.
That's not a decision which is up to the data-writer or the CF convention.

I think the description of size-one string-valued auxiliary coordinate
variables in sect 6.1 is careless (that's my fault - I wrote it) because it
doesn't point out that coordinate variables can't be string-valued. That could
be amended in a defect ticket. A string-value scalar coordinate variable is a
1D character array (dimensioned by string length). I think it is logically
equivalent to a size-one string-valued auxiliary coordinate variable, which is
a 2D character array in which one of the dimensions is string length and the
other is a size-one dimension of the data variable. Again, it's a convenience
feature which allows you to avoid adding a dimension to the file.

Best wishes

Jonathan
Received on Fri May 10 2013 - 10:26:29 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒