⇐ ⇒

[CF-metadata] statistic indices

From: Jonathan Gregory <j.m.gregory>
Date: Sat, 15 Mar 2008 22:29:14 +0000

Dear Heinke

Thanks for your email and sorry for another long delay in this correspondence
(two months this time). My slowness indicates that although this discussion is
useful and thought-provoking, I don't really have enough time to pursue it,
unfortunately; I am holding up progress. Perhaps the best way to take this
forward, if you would like to, would be for you to make specific proposals as
a trac ticket for amendment to CF, when you are ready, and I hope that others
will then comment.

You are, I think, suggesting an addition to cell_methods for a number_of_days
standard name, to indicate whether the condition is that the quantity (air
temperature etc.) should be below or above the threshold. That idea has more
general applicability than these statistical indices, and I think it's a good
use of cell_methods. My version of your suggestion is that we should introduce
new cell_methods of "above" and "below" for extensive quantities. If either of
these cell_methods is specified, the coordinate concerned would not be allowed
to have bounds. Instead, "x: [method] above" would mean the data variable (the
probability in this case) applies to the interval of x from the specified
coordinate upwards, and correspondingly for "x: [method] below". The
statistical method has the usual default, if omitted. Thus, "x: above" for a
coordinate value of x is a shorthand for "x: [method]" with coordinate bounds
of (x,infinity), and "x: below" is likewise a shorthand for coordinate bounds
of (-infinity,x). Is that the kind of thing you mean?

With this new qualification, we don't need separate standard names for
above and below threshold, so we can define standard names such as
  number_of_days_satisfying_condition_on_air_temperature
  number_of_days_conditional_on_air_temperature
  number_of_days_dependent_on_air_temperature
Which would you prefer? Do you have other suggestions?

If a standard_name has the phrase "dependent_on_X" (or whatever phrase is
used) it would mean that it must have a coordinate variable, scalar coordinate
variable or auxiliary coordinate variable with standard_name of X, to specify
the condition.

If a standard_name is a type of number_of_days, its cell_methods must indicate
the statistical processing which has been applied within days. However, this
does not deal with 5-day intervals. At present, CF only has special treatment
for days and years, since these are natural climatological periods. Do we need
to introduce new conventions for statistics calculated over arbitrary periods?

Putting all this together, we have

/* number of days with daily minimum temperature below -10degC */
float n1(lat,lon);
  n1:standard_name="number_of_days_dependent_on_air_temperature";
  n1:coordinates="c1 time";
  n1:cell_methods="time: minimum within days c1: below time: sum over days";
float c1;
  c1:standard_name="air_temperature";
  c1:units="degC";
data:
  c1=-10;


/* number of days exceeding 100 mm or 200 mm of accumulated precipitation */
float n2(c2,lat,lon);
  n2:standard_name="number_of_days_dependent_on_precipitation_amount";
  n2:coordinates="time";
  n2:cell_methods="time: sum within days c2: above time: sum over days";
float c2(c2);
  c2:standard_name="precipitation_amount";
  c2:units="kg m-2";
data:
  c2=100, 200;

The case when the threshold is geographically varying looks like this:

float n3(lat,lon);
  n3:standard_name="number_of_days_dependent_on_air_temperature";
  n3:coordinates="c3 time";
  n3:cell_methods="time: minimum within days c3: above time: sum over days";
float c3(lat,lon);
  c3:standard_name="air_temperature";
  c3:units="K";

c3 can have further attributes to describe what it is. For instance, it might
be the 10-percentile of the 1961-1990 climatology. It could say this simply in
the long_name attribute. That would be sufficient for humans to read, but not
convenient if you have to distinguish between various possibilities for c3
using software automatically. If you do, we need some convention for recording
the percentile, perhaps as a scalar coordinate variable, but this may require
some further thought.

Best wishes

Jonathan
Received on Sat Mar 15 2008 - 16:29:14 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒