⇐ ⇒

[CF-metadata] standards for probabilities

From: Jonathan Gregory <j.m.gregory>
Date: Fri, 18 Nov 2011 15:23:19 +0000

Dear Vegard

Sorry for slow response. I've been very busy this week.

> So, a bit more concrete, this is option 1:
>
> float rain_25(time, y, x);
> rain_25:standard_name = "precipitation_amount";
> rain_25:cell_methods = "realization: percentile(25)";

Yes, except that cell_method only refers to variables and doesn't contain
constants, at the moment. Therefore I was thinking it could be something like

  float rain(time,y,x);
    rain:cell_methods="realization: percentile pvar";
  float pvar(pvar);

and pvar is a coordinate variable which specifies the percentile(s). If there
is only one percentile, the dimension pvar=1, or pvar could be a scalar. This
syntax is the like the second one in the CF standard 7.3.3, for statistics
apply to different area-types, for the same reason: it needs to refer to a
coordinate variable in evaluating a statistic.

> The only problem I see with this is that in the resulting cdm realization is
> notused anywhere, apart from possibly in cell methods. But maybe this is ok?

Yes, it is OK, because standard_names can be included in cell_methods, and
realization is a standard_name.

Option 2:

> float precipitation_amount(time, percentile, y, x);
> ...
> float percentile(percentile);
> percentile:units = "1";
> percentile:standard_name = "cumulative_distribution_function_of_precipitation_amount";

To make this method as informative as option 1, the standard_name would be
cumulative_distribution_function_of_precipitation_amount_over_realization.
In option 1, "over realization" is indicated by the cell_methods.

You ask, "But what is the purpose of explicitly refering to
precipitation_amount in the standard name? would not
cumulative_distribution_function be better? Then the same dimension could be
used for other data, such as air_temperature." I agree that would be an
advantage. I suggested that precipitation_amount should be stated by analogy
with the guidelines for probability_density_function_of_X. For a PDF, the
units depend on what X is, so you must have a standard_name which includes X.
A CDF and a PDF are related concepts. However, this is not a strong argument.
If you had a PDF, it would probably be a data variable, not a coordinate
variable like your CDF is here.

Regarding Roy's comment, I agree with his concern about profileration of
percentiles, but I think both of these options allow that generality, as in
both cases the percentile value(s) are in variables.

The advantage of option 2 is that it only requires new standard names, whereas
option 1 requires an alteration to the CF convention, and it's a bit simpler.
The advantage of option 1 is that it's more compact, and it is natural to
regard percentiles as a cell_method, I would argue. I'm not sure which is
better.

Cheers

Jonathan
Received on Fri Nov 18 2011 - 08:23:19 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒