[CF-metadata] scalar coordinates from Jonathan Gregory on 2013-06-05 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Jonathan Gregory <j.m.gregory>
Date: Wed, 5 Jun 2013 14:11:41 +0100

Dear Mark

We still have a difference in our interpretation of scalar coordinates. Maybe
this is irreconcilable, but I am reluctant to reach that conclusion. The
reason why this is hard to settle is that, as we have agreed, it has no
implication at all for the netCDF file. It is just a matter of interpretation.
It's hard to decide because we are not constrained by the file format. The
constraints come from what you do with the data, and what design you would
like for data analysis software.

Our difference is about whether scalar coordinate variables represent a CF
construct of their own, or whether they are a convenient way of representing
size-one dimension coordinate constructs and size-one string-valued auxiliary
coordinate constructs, which are normally stored respectively as (Unidata
convention) coordinate variables and (CF) string-valued auxiliary coordinate
variables. As discussed, I don't recall an intention, when scalar coordinates
were introduced, that they would be a new kind of thing; they were intended to
be handy way of encoding an existing kind of thing, and that is what the words
in the standard document still suggest to me.

Therefore I think we need only two concepts and scalar coordinates are not a
third concept. I argue for this because it's simpler (applying Occam's razor,
if you like). We need only those two concepts to describe coordinate variables
in CF-netCDF files.

That means you have to decide which kind a scalar coordinate variable is
representing. I think the convention (up to CF 1.5 - see DSG email) implies
that if it's a numeric scalar, it's a size-one dimension coordinate construct,
and if it's a single string, it's a size-one auxiliary coordinate construct.

You are concerned this restricts your flexibility. I see why you say that, of
course, but practically speaking I don't think it does. You can't leave it
undecided what a scalar coordinate means (according to my view), but nothing
prevents you from following a different interpretation to the above when you
read the file. That may mean you have adopted a different view from the person
who created the file, but if that person did not *want* to make it clear what
sort it was (which is what you suggest) then surely he or she does not mind
which interpretation you adopt. Equally, you can read the file with one
interpretation, and then change your mind when it's in memory, by converting
dimension coordinate constructs to auxiliaries or vice-versa, creating or
dropping size-one dimensions. This is all easy to do in memory. The data is
completely unaffected, apart from being possibly reshaped by the insertion or
removal of size-one dimensions. It just shuffles the metadata around.

It matters when you come to aggregate different variables, within a file or
from different files. For instance, you can't aggregate two variables that have
different scalar values of both experiment_id and ensemble_member_number unless
you decide that these are both auxiliary coordinate variables of the same
(omitted) size-one dimension. On the other hand, if you have four data
variables, showing all the possible combinations of two scalars,
(experiment_id 1, ensemble_member_number 1)
(experiment_id 1, ensemble_member_number 2)
(experiment_id 2, ensemble_member_number 1)
(experiment_id 2, ensemble_member_number 2)
you may wish to aggregate them with two size-2 dimensions. Thus, you may need
to change the interpretation of scalars, in order to enable the aggregation or
determine how it's done. But to get that flexibility doesn't require that
initially you were undecided about what the scalars mean. It only needs you to
be able change your mind about what they mean, which is easy to do (last
paragraph). Aggregation of data variables is *not* part of the CF standard
(at the moment). Therefore what you do about this lies in the realm of
software design, which can adopt its own rules, which the writer of the data
can't influence.

Once the data is in memory, you can do what you like. You might wish to be
flexible about interpretation of multi-valued coordinates, not just
scalars. For instance, you might want to combine two data variables (lat,lon)
with different lat and lon coordinate variables and dimensions, by flattening
them both, to make lat and lon auxiliaries of a discrete (index) axis. That's
not how the data was written, but so what? It might be convenient for the use
of the data. It doesn't imply the coordinate variables to be of unspecific
meaning in the first place.

Best wishes

Jonathan
Received on Wed Jun 05 2013 - 07:11:41 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST