[CF-metadata] weighted time mean vs. conditional time mean. from Jonathan Gregory on 2016-05-21 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Jonathan Gregory <j.m.gregory>
Date: Sat, 21 May 2016 10:31:41 +0100

Dear Martin

In my reading of 7.3.3 and the conformance document, it seems clear that
"where" is intended to be used with area types.

> There is an issue, it appears, about the use of the "where" modifier for cell_methods elements other than "area:". Jonathan believes "where" should only apply for area on the basis that this where the motivation comes from in the first paragraph of section 7.3.3. The subsequent paragraphs in section 7.3.3. describe the use of "where" with a generic element "name: ....". The compliance document clearly states that "where" can be used with any string.

I'm sorry, I can't find that - please could you point it out? In
http://cfconventions.org/Data/cf-documents/requirements-recommendations/requirements-recommendations-1.6.html
regarding
method [where type1 [over type2]]
it says
The valid values for type1 are the name of a string-valued auxiliary or scalar
coordinate variable with a standard_name of area_type, or any string value
allowed for a variable of standard_name of area_type.

We could generalise area_types to mean "states" so they can apply in time as
well as space. I think all the existing ones could be interpreted in this way
i.e. with the sense of "when" rather than "where". Vegetation is sometimes
present and sometimes absent at any given spot, for instance, just as it is
present in some spots and not others at any given time.

Suppose you want to calculate a radiative flux for a grid-box in cloud-free
air. You can do this on each instantaneous timestep for the cloud-free fraction
of the grid-box, and then calculate a time-mean of these timestep values i.e.
"area: mean where clear_sky time: mean". If the input data supplies a higher
spatial resolution than the grid-box, so you have many timeseries, you could
alternatively do it the other way round, and first calculate, for each of the
points, the value of the flux for those timesteps when there is no cloud, then
calculate an area-mean of these local values i.e. "time: mean where clear_sky
area: mean". These aren't the same because they imply different weights.

For example, suppose you have three points within the grid-box and two times,
and the data is as follows:

a X X
b c X

where X means cloudy, and a, b, c are clear-sky values. According to the first
method, the value is a/2 + b/4 + c/4, and according to the second method it is
a/4 + b/4 + c/2, if I've done my sums right. There is a third method, in which
we consider both time and space together: "time: area: mean where clear_sky".
In this case the value is a/3 + b/3 + c/3.

If I'm right about this, I think we could make this generalisation and it would
not be problematic. However, as usual, we should only make the change if there
is a use-case which demands it.

Best wishes

Jonathan
Received on Sat May 21 2016 - 03:31:41 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST