⇐ ⇒

[CF-metadata] CF-1.6 DSG clarification: time series & lat/lon coordinates | RE: scalar coordinates

From: Hedley, Mark <mark.hedley>
Date: Wed, 5 Jun 2013 10:47:33 +0000

Hello

The discussion on CF-1.6 DSG clarification: time series & lat/lon coordinates provides us with further example of the use of scalar coordinates and their potential interpretation. I would like to use this to investigate the question of scalar interpretation.

The conversation on this issue has solidified my view that the scalar coordinate variable is a semantic container requiring slightly different handling from the vector coordinates (Coordinate Variables and Auxiliary Coordinates).

I do not think it is safe to assume that a scalar coordinate variable is merely an encoding short cut with no semantic content. Some care is required in interpreting scalar coordinate variables. In the data modelling activities they are a case which needs handling and representing, they are not 'just DimensionCoordinates'.

Examples H4 and H5 make explicit use of scalar coordinate variables which do not imply logical degrees of freedom in the data variables.

As I have advocated recently, I feel we can maintain all of the historical functionality of scalar coordinate variables as encoding short cuts and clarify their uses, including as part of the definition of discrete sampling geometries, in a coherent and helpful fashion. That is the aim of the discussion thread titled [CF-metadata] scalar coordinates and the perspective I am trying to advocate there.

mark

|> on CF-1.6 DSG clarification: time series & lat/lon coordinates

> John Caron
> If we use the time series featureType as example
> (from http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.html#idp8307552)
> AFAIU, the orthogonal multidimensional representation would be:
> float humidity(station,time)
> not
> float humidity(lat, lon, time)

This seems to me to point at semantic differences between a scalar coordinate and a coordinate variable of size one, when used by discrete sampling geometries.

>>> Jonathan Gregory
>>> with alt=lat=lon=1 is legal in CF. In fact the coordinates attribute is not
>>> needed, because these are all (Unidata) coordinate variables (1D, with name
>>> of the variable equal to the name of the dimension). Ignoring the coordinates
>>> attribute, this example is fine in COARDS as well. In the data model, lon lat
>>> alt time are all dimension coordinate constructs with separate domain axes.
>>>
>>> But when there are *two* timeseries, you would not have alt=lat=lon=2. That
>>> would mean three independent size-2 dimensions. This would also be legal in
>>> CF and COARDS, but it means timeseries at a grid of 8 points, not two points.
>>> To deal with this situation, we introduce an index dimension of size 2, and
>>> make alt lat lon auxiliary coordinate variables of this single dimension. In
>>> the data model, there is then only one domain axis for alt lat lon.

This also seems to me to be evidence of semantic interpretation of scalars within the appendix H examples.

>>> Jonathan Gregory
>>> Back to the case of one timeseries: Example H4 shows scalar coordinate
>>> variables for alt lat lon. That is, these size-1 dimensions have been omitted.
>>> In this case, the coordinates attribute is needed; that's how scalar coordinate
>>> variables are attached to the data variable. In the data model (in my opinion)
>>> this is logically equivalent including the size-1 dimensions.

I do not think that this last statement is correct, there are subtleties of interpretation at work here which help to provide the semantics of discrete sampling geometries.

>> John Caron
>> Lets see, its currently not legal as a DSG file, according to the
>> spec. The CDM will barf on it, though I could put in a workaround.
>>
>> Should it be legal? That is, should people be "allowed" to put in
>> extraneous dimensions that only make sense for some values of the
>> dimension length (eg 1 but not >1) ? I think it would be a rather
>> ugly wart, and you would gain no new functionality.

I agree with this analysis, I do not think this approach should be adopted. I would much rather see the recognition of scalars as interpreted subtly differently from vector coordinates (coordinate variables, auxiliaries).

>> I also think it would confuse dimensions with coordinates (which has
>> already happened because the distinction isnt obvious). I think we
>> should try to be clear about the use of dimensions because it makes
>> the data model more powerful. So I would prefer not.

I also agree with this perspective



>>> Jonathan Gregory
>>> Maybe this question raises an issue for chapter 9 and Example H4. The example
>>> is following Section 9.2:
>>>
>>> "If there is only a single feature to be stored in a data variable, there is no
>>> need for an instance dimension and it is permitted to omit it. The data will
>>> then be one-dimensional, which is a special (degenerate) case of the
>>> multidimensional array representation. The instance variables will be scalar
>>> coordinate variables; the data variable and other auxiliary coordinate
>>> variables will have only an element dimension and not have an instance
>>> dimension, e.g. data(o) and t(o) for a single timeSeries."
>>> In the multidimensional array representation, featureType doesn't have to be
>>> coded, because this representation has always existed in CF. We could say that
>>> *if* you encode featureType, there *must* be an instance dimension (of size 1
>>> if appropriate) and that alt lat lon must be auxiliary coordinate variables
>>> with this dimension. That would be a restriction we don't have in CF 1.6, so
>>> it would be a change to CF. What do you think, John C?

I don't like this change, it makes the conventions less easy to use, I feel, without adding functionality.

Recognition of the status of scalars addresses this issue more neatly.

>> John Caron
>> From my POV, its just a practical question to make things clear
>> between data producer and consumer. I think we allowed the scalar
>> instance coordinates because it was a natural way for producers to
>> think when there was only one feature in the file, ie "why do you
>> want me to add this extra dimension?" As long as the "featureType"
>> attribute is present, as well as the "coordinates" attribute I think
>> the meaning is unambiguous. Requiring a dimension=1 is maybe a bit
>> simpler, but i would still have to deal with the scalar case for
>> versions before we change, so its not really better for me.

I don't think requiring dimensions of size=1 is good here, I think the current implementation of discrete sampling geometries is neat and provides the required information. It is in the process of being adopted in its current form, changes like this are liable to add confusion, in my opinion.
Received on Wed Jun 05 2013 - 04:47:33 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒