⇐ ⇒

[CF-metadata] weighted time mean vs. conditional time mean.

From: martin.juckes at stfc.ac.uk <martin.juckes>
Date: Wed, 25 May 2016 11:47:04 +0000

Dear Jonathan,

yes, it is absolutely clear that "where" can only be used with area types. It is also clear, I thought, that some of these area types may vary with time: the area type list includes "fire" and "cloud", for example.

7.3.3 gives a template for the cell_methods element:
dim1: [dim2: [dim3: ...]] method [where type1 [over type2]] [within|over days|years] [(comment)]

and goes on to say: "The valid values for dim1 [dim2[dim3 ...] ] are the names of dimensions of the data variable, names ofscalar coordinate variables of the data variable, valid standard names,or the word area." There is no stated restriction on the values of "dim" which can precede "where ...".

You appear to be taking a geographical interpretation of "where" and assuming that it can only apply to spatial information, but have been reading it from a mathematical perspective, in which it can refer to any dimension. In mathematics, statements of the form "sum of A where condition B" carry no implication that "where" has anything to do with area. From this perspective, there is no need to introduce "when" ..

The use case that prompted this, from SIMIP, corresponds to your 3rd example, in which we are averaging over all points in the cell and time period covered for which the area type is valid, giving each point equal weight. This can be handled, as you and Karl have pointed out earlier, with a comment in the cell_methods string of the form "(weighted by ....)", but I feel that the use case is clear enough that there is a need for it to be treated in the conventions.

regards,
Martin

##############################################################



Dear Martin

In my reading of 7.3.3 and the conformance document, it seems clear that
"where" is intended to be used with area types.

> There is an issue, it appears, about the use of the "where" modifier for cell_methods elements other than "area:". Jonathan believes "where" should only apply for area on the basis that this where the motivation comes from in the first paragraph of section 7.3.3. The subsequent paragraphs in section 7.3.3. describe the use of "where" with a generic element "name: ....". The compliance document clearly states that "where" can be used with any string.

I'm sorry, I can't find that - please could you point it out? In
http://cfconventions.org/Data/cf-documents/requirements-recommendations/requirements-recommendations-1.6.html
regarding
method [where type1 [over type2]]
it says
The valid values for type1 are the name of a string-valued auxiliary or scalar
coordinate variable with a standard_name of area_type, or any string value
allowed for a variable of standard_name of area_type.

We could generalise area_types to mean "states" so they can apply in time as
well as space. I think all the existing ones could be interpreted in this way
i.e. with the sense of "when" rather than "where". Vegetation is sometimes
present and sometimes absent at any given spot, for instance, just as it is
present in some spots and not others at any given time.

Suppose you want to calculate a radiative flux for a grid-box in cloud-free
air. You can do this on each instantaneous timestep for the cloud-free fraction
of the grid-box, and then calculate a time-mean of these timestep values i.e.
"area: mean where clear_sky time: mean". If the input data supplies a higher
spatial resolution than the grid-box, so you have many timeseries, you could
alternatively do it the other way round, and first calculate, for each of the
points, the value of the flux for those timesteps when there is no cloud, then
calculate an area-mean of these local values i.e. "time: mean where clear_sky
area: mean". These aren't the same because they imply different weights.

For example, suppose you have three points within the grid-box and two times,
and the data is as follows:

a X X
b c X

where X means cloudy, and a, b, c are clear-sky values. According to the first
method, the value is a/2 + b/4 + c/4, and according to the second method it is
a/4 + b/4 + c/2, if I've done my sums right. There is a third method, in which
we consider both time and space together: "time: area: mean where clear_sky".
In this case the value is a/3 + b/3 + c/3.

If I'm right about this, I think we could make this generalisation and it would
not be problematic. However, as usual, we should only make the change if there
is a use-case which demands it.

Best wishes

Jonathan
Received on Wed May 25 2016 - 05:47:04 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒