⇐ ⇒

[CF-metadata] weighted time mean vs. conditional time mean.

From: martin.juckes at stfc.ac.uk <martin.juckes>
Date: Thu, 26 May 2016 09:17:37 +0000

Thanks Both,

I'll try to formulate an example, starting from the current example 7.7,

regards,
Martin


________________________________________
From: Pamment, Alison (STFC,RAL,RALSP)
Sent: 25 May 2016 15:53
To: Juckes, Martin (STFC,RAL,RALSP); cf-metadata at cgd.ucar.edu
Subject: RE: [CF-metadata] weighted time mean vs. conditional time mean.

Hi Martin,

I think you and Jonathan are both right!

I think we originally introduced 'where' in cell_methods to help avoid a proliferation of standard names such as surface_temperature_where_land, surface_temperature_where_open_sea, surface_temperature_where_snow, etc, and also to avoid ambiguities over how area means were calculated (e.g. area: mean where sea_ice over all_area_types). We weren't thinking about anything other than horizontal dimensions and certainly entries in the area_type table are for spatial quantities even if these can vary with time.

But I agree with this:
> 7.3.3 gives a template for the cell_methods element:
> dim1: [dim2: [dim3: ...]] method [where type1 [over type2]] [within|over
> days|years] [(comment)]
>
> and goes on to say: "The valid values for dim1 [dim2[dim3 ...] ] are the
> names of dimensions of the data variable, names ofscalar coordinate
> variables of the data variable, valid standard names,or the word area."
> There is no stated restriction on the values of "dim" which can precede
> "where ..."
so the convention as it stands is ambiguous and at the very least needs correcting.

I support the idea of extending the 'where' convention to apply to dimensions other than horizontal ones. SIMIP is raising all sorts of new use cases, including one in which an area fraction is calculated over parts of a grid box that have sea ice within a particular thickness range (although I think that particular case may have to be handled by a non-standardized comment!)

Best wishes,
Alison

------
Alison Pamment Tel: +44 1235 778065
Centre for Environmental Data Analysis Email: alison.pamment at stfc.ac.uk
STFC Rutherford Appleton Laboratory
R25, 2.22
Harwell Campus, Didcot, OX11 0QX, U.K.



> -----Original Message-----
> From: CF-metadata [mailto:cf-metadata-bounces at cgd.ucar.edu] On Behalf
> Of martin.juckes at stfc.ac.uk
> Sent: 25 May 2016 12:47
> To: cf-metadata at cgd.ucar.edu
> Subject: [CF-metadata] weighted time mean vs. conditional time mean.
>
> Dear Jonathan,
>
> yes, it is absolutely clear that "where" can only be used with area types. It is
> also clear, I thought, that some of these area types may vary with time: the
> area type list includes "fire" and "cloud", for example.
>
> 7.3.3 gives a template for the cell_methods element:
> dim1: [dim2: [dim3: ...]] method [where type1 [over type2]] [within|over
> days|years] [(comment)]
>
> and goes on to say: "The valid values for dim1 [dim2[dim3 ...] ] are the
> names of dimensions of the data variable, names ofscalar coordinate
> variables of the data variable, valid standard names,or the word area."
> There is no stated restriction on the values of "dim" which can precede
> "where ...".
>
> You appear to be taking a geographical interpretation of "where" and
> assuming that it can only apply to spatial information, but have been
> reading it from a mathematical perspective, in which it can refer to any
> dimension. In mathematics, statements of the form "sum of A where
> condition B" carry no implication that "where" has anything to do with area.
> From this perspective, there is no need to introduce "when" ..
>
> The use case that prompted this, from SIMIP, corresponds to your 3rd
> example, in which we are averaging over all points in the cell and time
> period covered for which the area type is valid, giving each point equal
> weight. This can be handled, as you and Karl have pointed out earlier, with a
> comment in the cell_methods string of the form "(weighted by ....)", but I
> feel that the use case is clear enough that there is a need for it to be treated
> in the conventions.
>
> regards,
> Martin
>
> ##############################################################
>
>
>
> Dear Martin
>
> In my reading of 7.3.3 and the conformance document, it seems clear that
> "where" is intended to be used with area types.
>
> > There is an issue, it appears, about the use of the "where" modifier for
> cell_methods elements other than "area:". Jonathan believes "where"
> should only apply for area on the basis that this where the motivation
> comes from in the first paragraph of section 7.3.3. The subsequent
> paragraphs in section 7.3.3. describe the use of "where" with a generic
> element "name: ....". The compliance document clearly states that "where"
> can be used with any string.
>
> I'm sorry, I can't find that - please could you point it out? In
> http://cfconventions.org/Data/cf-documents/requirements-
> recommendations/requirements-recommendations-1.6.html
> regarding
> method [where type1 [over type2]]
> it says
> The valid values for type1 are the name of a string-valued auxiliary or scalar
> coordinate variable with a standard_name of area_type, or any string value
> allowed for a variable of standard_name of area_type.
>
> We could generalise area_types to mean "states" so they can apply in time
> as
> well as space. I think all the existing ones could be interpreted in this way
> i.e. with the sense of "when" rather than "where". Vegetation is sometimes
> present and sometimes absent at any given spot, for instance, just as it is
> present in some spots and not others at any given time.
>
> Suppose you want to calculate a radiative flux for a grid-box in cloud-free
> air. You can do this on each instantaneous timestep for the cloud-free
> fraction
> of the grid-box, and then calculate a time-mean of these timestep values
> i.e.
> "area: mean where clear_sky time: mean". If the input data supplies a
> higher
> spatial resolution than the grid-box, so you have many timeseries, you
> could
> alternatively do it the other way round, and first calculate, for each of the
> points, the value of the flux for those timesteps when there is no cloud,
> then
> calculate an area-mean of these local values i.e. "time: mean where
> clear_sky
> area: mean". These aren't the same because they imply different weights.
>
> For example, suppose you have three points within the grid-box and two
> times,
> and the data is as follows:
>
> a X X
> b c X
>
> where X means cloudy, and a, b, c are clear-sky values. According to the first
> method, the value is a/2 + b/4 + c/4, and according to the second method it
> is
> a/4 + b/4 + c/2, if I've done my sums right. There is a third method, in which
> we consider both time and space together: "time: area: mean where
> clear_sky".
> In this case the value is a/3 + b/3 + c/3.
>
> If I'm right about this, I think we could make this generalisation and it would
> not be problematic. However, as usual, we should only make the change if
> there
> is a use-case which demands it.
>
> Best wishes
>
> Jonathan
>
>
>
>
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Thu May 26 2016 - 03:17:37 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST

⇐ ⇒