⇐ ⇒

[CF-metadata] weighted time mean vs. conditional time mean.

From: Jonathan Gregory <j.m.gregory>
Date: Wed, 25 May 2016 15:37:13 +0100

Dear Martin

I took the interpretation that "where" means area because that's why it was
introduced. However I agree that it can be generalised and that the SIMIP
example is a use-case for it. So it would seem appropriate to make a proposal
on a trac ticket for the required small (I expect) changes to the wording of
the convention to make this possibility explicit. The SIMIP case could be
included as an example.

Best wishes

Jonathan

----- Forwarded message from martin.juckes at stfc.ac.uk -----

> Date: Wed, 25 May 2016 11:47:04 +0000
> From: martin.juckes at stfc.ac.uk
> To: cf-metadata at cgd.ucar.edu
> Subject: [CF-metadata] weighted time mean vs. conditional time mean.
>
> Dear Jonathan,
>
> yes, it is absolutely clear that "where" can only be used with area types. It is also clear, I thought, that some of these area types may vary with time: the area type list includes "fire" and "cloud", for example.
>
> 7.3.3 gives a template for the cell_methods element:
> dim1: [dim2: [dim3: ...]] method [where type1 [over type2]] [within|over days|years] [(comment)]
>
> and goes on to say: "The valid values for dim1 [dim2[dim3 ...] ] are the names of dimensions of the data variable, names ofscalar coordinate variables of the data variable, valid standard names,or the word area." There is no stated restriction on the values of "dim" which can precede "where ...".
>
> You appear to be taking a geographical interpretation of "where" and assuming that it can only apply to spatial information, but have been reading it from a mathematical perspective, in which it can refer to any dimension. In mathematics, statements of the form "sum of A where condition B" carry no implication that "where" has anything to do with area. From this perspective, there is no need to introduce "when" ..
>
> The use case that prompted this, from SIMIP, corresponds to your 3rd example, in which we are averaging over all points in the cell and time period covered for which the area type is valid, giving each point equal weight. This can be handled, as you and Karl have pointed out earlier, with a comment in the cell_methods string of the form "(weighted by ....)", but I feel that the use case is clear enough that there is a need for it to be treated in the conventions.
>
> regards,
> Martin
>
> ##############################################################
>
>
>
> Dear Martin
>
> In my reading of 7.3.3 and the conformance document, it seems clear that
> "where" is intended to be used with area types.
>
> > There is an issue, it appears, about the use of the "where" modifier for cell_methods elements other than "area:". Jonathan believes "where" should only apply for area on the basis that this where the motivation comes from in the first paragraph of section 7.3.3. The subsequent paragraphs in section 7.3.3. describe the use of "where" with a generic element "name: ....". The compliance document clearly states that "where" can be used with any string.
>
> I'm sorry, I can't find that - please could you point it out? In
> http://cfconventions.org/Data/cf-documents/requirements-recommendations/requirements-recommendations-1.6.html
> regarding
> method [where type1 [over type2]]
> it says
> The valid values for type1 are the name of a string-valued auxiliary or scalar
> coordinate variable with a standard_name of area_type, or any string value
> allowed for a variable of standard_name of area_type.
>
> We could generalise area_types to mean "states" so they can apply in time as
> well as space. I think all the existing ones could be interpreted in this way
> i.e. with the sense of "when" rather than "where". Vegetation is sometimes
> present and sometimes absent at any given spot, for instance, just as it is
> present in some spots and not others at any given time.
>
> Suppose you want to calculate a radiative flux for a grid-box in cloud-free
> air. You can do this on each instantaneous timestep for the cloud-free fraction
> of the grid-box, and then calculate a time-mean of these timestep values i.e.
> "area: mean where clear_sky time: mean". If the input data supplies a higher
> spatial resolution than the grid-box, so you have many timeseries, you could
> alternatively do it the other way round, and first calculate, for each of the
> points, the value of the flux for those timesteps when there is no cloud, then
> calculate an area-mean of these local values i.e. "time: mean where clear_sky
> area: mean". These aren't the same because they imply different weights.
>
> For example, suppose you have three points within the grid-box and two times,
> and the data is as follows:
>
> a X X
> b c X
>
> where X means cloudy, and a, b, c are clear-sky values. According to the first
> method, the value is a/2 + b/4 + c/4, and according to the second method it is
> a/4 + b/4 + c/2, if I've done my sums right. There is a third method, in which
> we consider both time and space together: "time: area: mean where clear_sky".
> In this case the value is a/3 + b/3 + c/3.
>
> If I'm right about this, I think we could make this generalisation and it would
> not be problematic. However, as usual, we should only make the change if there
> is a use-case which demands it.
>
> Best wishes
>
> Jonathan
>
>
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

----- End forwarded message -----
Received on Wed May 25 2016 - 08:37:13 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒