⇐ ⇒

[CF-metadata] scalar coordinates

From: Steve Hankin <steven.c.hankin>
Date: Fri, 10 May 2013 07:36:25 -0700

All,

I'm for option B, though I might be persuaded to go for option A given a
compelling counter-example. The example that has been given regarding
forecast times seems out of step with common CF practice in the
utilization of CF "forecast run aggregations". That context recognizes
forecast output collections as 5-dimensional datasets -- both the
calendar date of the forecast time step, and the run date of the model
are valid time coordinates. Ambiguity is not desirable in this case.
It is important to be able to traverse the collection along both types
of time axis. (scroll to section 4 at
http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/FmrcAggregation.html)

A single forecast file, lifted from the context of the collection,
really does have two distinct types of degenerate time axes, reflecting
its position in a 5-dimentnsional conceptual space. The CF file should
not be implying that there is an arbitrary either-or choice between two
dates; it should make clear the semantic distinctions between the two.

     - Steve

==========================================================

On 5/10/2013 7:20 AM, Seth McGinnis wrote:
> I'll agree with option A.
>
> I can think of a number of cases where scalar coordinate variables
> are a convenient way to record metadata about the positioning
> of the data in space-time, but it's not like the data at other
> positions actually exists and isn't recorded in this file; it's just a
> way of formatting the metadata. Which makes it a bit weird to
> insist that there's always a degenerate dimension associated
> with the scalar coordinate.
>
> Consider surface observations. A scalar coordinate is a sensible
> way to record e.g. the height of the observation (2-m screen
> height for temps & humidity vs 10-m anemometer height for winds),
> but it's not as if there's an entire spectrum of different heights
> for the observations that you're sampling from; those heights are
> the only ones that there were or ever will be.
>
> So I can't see any utility in requiring the height to be treated as a
> dimension in that case. But there is some potential disutility, in that
> if you've got software that slices and dices the data along different
> dimensions, adding in a degenerate dimension for the height is
> likely to just clutter things up and confuse the issue.
>
> Cheers,
>
> --Seth
>
>
> On Fri, 10 May 2013 08:56:40 +0000
> "Hattersley, Richard" <richard.hattersley at metoffice.gov.uk> wrote:
>> Perhaps it might be helpful to add some context, i.e. "Why do I care?"
>>
>> My understanding is, Jonathan Gregory and Mark Hedley intended to
>> resolve this ambiguity in a subsequent revision of CF. And that
>> resolution will have an impact on both data producers and data
>> consumers.
>>
>> As a data producer you might care because you're producing data which
>> will become invalid. As a data consumer you might find that software
>> tools interpret data differently, and hence you might have to change
>> your code.
>>
>>
>>> The question is this: "Does a Scalar Coordinate Variable....":
>>>
>>> Option A: Represent either a Coordinate Variable or an Auxiliary
>>> Coordinate? The presence of a scalar does not mandate the existence of
>>> a new dimension; it can imply an undeclared dimension of size one
>>> that is not explicitly defined in the file but it does not have to.
>>>
>>> Or
>>>
>>> Option B: Always represent a Coordinate variable which explicitly
>>> declares a dimension of size one, where this dimension is not stated
>>> in the file? An exception is provided for string scalar coordinate
>>> variables only, which are defined as Auxiliary Coordinates but also
>>> mandate a new dimension of size one.
>> It seems the difference hinges around the concept of "degrees of
>> freedom". In those terms...
>>
>> Option A lets the data producer say, "Here are some scalar pieces of
>> metadata - data consumers can choose what to do with them."
>>
>> Whereas option B implies, "These are the degrees of freedom - no more,
>> no less."
>>
>>
>> One impact of this is in the overdetermined case of time,
>> forecast_reference_time, and forecast_period. Even when a data variable
>> contains data for a single point in time, option B would require the
>> *producer* to decide which two variables describe the two degrees of
>> freedom, and which variable is the dependent variable.
>>
>> But as a consumer I might choose to aggregate a collection of these
>> single-time-point data variables which are best parameterised by a
>> *different* pair of time, forecast_reference_time, or forecast_period.
>> In general, it's not possible for the data producer to know in advance
>> which two variables best parameterise the collection I'm interested in.
>>
>>
>> For this, and other related reasons involving ensembles, I'm in favour
>> of option A.
>>
>>
>> Richard Hattersley
>> Iris Benevolent Dictator
>> Met Office
>> _______________________________________________
>> CF-metadata mailing list
>> CF-metadata at cgd.ucar.edu
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Fri May 10 2013 - 08:36:25 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒