[CF-metadata] standards for probabilities from Vegard Bønes on 2011-11-16 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Vegard Bønes <vegard.bones>
Date: Wed, 16 Nov 2011 12:18:32 +0000 (UTC)

Hi,

The methods for estimating probabililties are non-trivial, and may change over time. Because of that I will prefer to keep information about the exact process outside of the generated file.

I have not been able able to find any references to realization_weight in the standard documents. Could you please refer me to the right place?

VG

----- Original Message -----
Fra: "Jamie Kettleborough" <jamie.kettleborough at metoffice.gov.uk>
Til: "Vegard B?nes" <vegard.bones at met.no>, "Jonathan Gregory" <j.m.gregory at reading.ac.uk>
Kopi: cf-metadata at cgd.ucar.edu, "Jamie Kettleborough" <jamie.kettleborough at metoffice.gov.uk>
Sendt: 16. november 2011 11:53:22
Emne: RE: [CF-metadata] standards for probabilities

Hello Vegard,

How do you generate your cdf from your realisations? Do you simply weight each ensemble member equally? I think there are cases where you may weight by some measure of how 'good' you think the ensemble member is (some sort of measure of its error - you downweight those with high errors). If you are storing the output from ensemble members in the file then think cf allows for this using the 'realzation_weight' standard name - to store your errors/weights in the file.

Furthermore you may want to know the sensitivity of your cdf to your error estimates so you could have more than one cdf for the same variable, but based on different ways of deriving the errors/weights.

Is this something CF needs to worry about, or is it a case of trying to add something that's not really needed yet? Or maybe this is not in scope for CF anyway, and it should be left to something more like 'audit/history/provenance' meta data?

Jamie

> -----Original Message-----
> From: cf-metadata-bounces at cgd.ucar.edu
> [mailto:cf-metadata-bounces at cgd.ucar.edu] On Behalf Of Vegard B?nes
> Sent: 15 November 2011 13:15
> To: Jonathan Gregory
> Cc: cf-metadata at cgd.ucar.edu
> Subject: Re: [CF-metadata] standards for probabilities
>
> Thank you, Jonathan! :)
>
> So, a bit more concrete, this is option 1:
>
> float rain_25(time, y, x);
> rain_25:standard_name = "precipitation_amount";
> rain_25:cell_methods = "realization: percentile(25)";
>
> The only problem I see with this is that in the resulting cdm
> realization is not used anywhere, apart from possibly in cell
> methods. But maybe this is ok?
>
>
> If I understand the second option correctly, this would lead
> to something like this:
>
> float precipitation_amount(time, percentile, y, x); ...
> float percentile(percentile);
> percentile:units = "1";
> percentile:standard_name =
> "cumulative_distribution_function_of_precipitation_amount";
>
> But what is the purpose of explicitly refering to
> precipitation_amount in the standard name? would not
> cumulative_distribution_function be better? Then the same
> dimension could be used for other data, such as air_temperature.
>
> Or, if we want to add something about the nature of the
> source data for the function, it could be called something
> like cumulative_distribution_function_due_to_realization?
>
>
> I am still a bit uncertain about what is the best, though.
>
>
> -- Vegard
>
>
>
>
> ----- Original Message -----
> Fra: "Jonathan Gregory" <j.m.gregory at reading.ac.uk>
> Til: "Vegard B??nes" <vegard.bones at met.no>
> Kopi: cf-metadata at cgd.ucar.edu
> Sendt: 15. november 2011 11:11:52
> Emne: Re: [CF-metadata] standards for probabilities
>
> Dear Vegard
>
> > I want to express such things as "25th percentile
> precipitation amount" (based on ensemble data), and
> probability that air temperature will be within 2.5 degrees
> of the forecast. How should I do this?
>
> You are right, this case has not yet been dealt with,
> although the guidelines for construction of standard names
> foresee that needs like this might arise!
>
> If the quantity is a precipitation_amount, it's fine to use
> that standard name. The question is how to record that is the
> 25th percentile. Two possible ways to do this would be:
>
> * To extend the possible syntax of cell_methods so that it
> can describe percentiles. It is already possible to indicate
> a median in cell_methods, and that is a particular
> percentile. The advantage of this way of doing it would be
> that you would record whether the distribution of
> precipitation amounts being considered was for
> time-variation, or spatial variation, or some other kind of
> variation. Obviously you could have a probability
> distribution with percentiles for many different independent
> variables.
>
> * To use a size-1 or scalar coordinate variable to record the
> probability, with a new standard_name, perhaps
> cumulative_distribution_function_of_precipitation_amount.
> The value of this coordinate would be 0.25 for the 25th
> percentile. The advantage of this method would be that you
> could have several different percentiles in the same
> variable, by having a multivalued probability coord.
> If you wanted to be specific about what the independent
> variable was, that would have to be included in the standard
> name as well e.g.
> cumulative_distribution_function_of_precipitation_amount_over_time.
>
> What do you think?
>
> Cheers
>
> Jonathan
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
Received on Wed Nov 16 2011 - 05:18:32 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST