[CF-metadata] standards for probabilities
Dear Vegard
Sorry for slow response. I've been very busy this week.
> So, a bit more concrete, this is option 1:
>
> float rain_25(time, y, x);
> rain_25:standard_name = "precipitation_amount";
> rain_25:cell_methods = "realization: percentile(25)";
Yes, except that cell_method only refers to variables and doesn't contain
constants, at the moment. Therefore I was thinking it could be something like
float rain(time,y,x);
rain:cell_methods="realization: percentile pvar";
float pvar(pvar);
and pvar is a coordinate variable which specifies the percentile(s). If there
is only one percentile, the dimension pvar=1, or pvar could be a scalar. This
syntax is the like the second one in the CF standard 7.3.3, for statistics
apply to different area-types, for the same reason: it needs to refer to a
coordinate variable in evaluating a statistic.
> The only problem I see with this is that in the resulting cdm realization is
> notused anywhere, apart from possibly in cell methods. But maybe this is ok?
Yes, it is OK, because standard_names can be included in cell_methods, and
realization is a standard_name.
Option 2:
> float precipitation_amount(time, percentile, y, x);
> ...
> float percentile(percentile);
> percentile:units = "1";
> percentile:standard_name = "cumulative_distribution_function_of_precipitation_amount";
To make this method as informative as option 1, the standard_name would be
cumulative_distribution_function_of_precipitation_amount_over_realization.
In option 1, "over realization" is indicated by the cell_methods.
You ask, "But what is the purpose of explicitly refering to
precipitation_amount in the standard name? would not
cumulative_distribution_function be better? Then the same dimension could be
used for other data, such as air_temperature." I agree that would be an
advantage. I suggested that precipitation_amount should be stated by analogy
with the guidelines for probability_density_function_of_X. For a PDF, the
units depend on what X is, so you must have a standard_name which includes X.
A CDF and a PDF are related concepts. However, this is not a strong argument.
If you had a PDF, it would probably be a data variable, not a coordinate
variable like your CDF is here.
Regarding Roy's comment, I agree with his concern about profileration of
percentiles, but I think both of these options allow that generality, as in
both cases the percentile value(s) are in variables.
The advantage of option 2 is that it only requires new standard names, whereas
option 1 requires an alteration to the CF convention, and it's a bit simpler.
The advantage of option 1 is that it's more compact, and it is natural to
regard percentiles as a cell_method, I would argue. I'm not sure which is
better.
Cheers
Jonathan
Received on Fri Nov 18 2011 - 08:23:19 GMT
This archive was generated by hypermail 2.3.0
: Tue Sep 13 2022 - 23:02:41 BST