⇐ ⇒

[CF-metadata] CF and a representation of probalistic forecasts

From: Kettleborough, Jamie <jamie.kettleborough>
Date: Mon, 15 Jan 2007 13:26:19 +0000

Hello,

I think we need to return to this posting
(http://www.cgd.ucar.edu/pipermail/cf-metadata/2006/000947.html) as I
don't think it covers everything it needs to. This is the posting that
is the precursor for the whole ensemble debate. It introduced the
'realization' dimension (though the original post called it 'sample'),
though it may have done it in an 'under the radar' way.

My attempt at the requirements would go something like:

1. A common way of representing probability distribution is through a
set of sample points with an associated weight on each sample point. CF
needs a way to accommodate this representation of probability
distributions in support of uncertainty analysis (e.g. of a weather and
climate forecasts [though there may be other applications in
representation of non-gaussian errors in obs in data assimilation? - I
don't know much about this, but suspect there _may_ be overlap])
2. The distributions can be multi variate (e.g. be a joint pdf on
temperature and precipitation).
3. There are different methods for calculating weights and data
producers may provide more than one set of weights (e.g. based on
different sets of observations), or data users may want to use there own
sets of weights.
4. The representation should provide the information required so that
applications can calculate the distribution moments (mean, variance,
etc.), summary statistics (mode, quartiles) etc. (Steve's requirement 4.
in http://www.cgd.ucar.edu/pipermail/cf-metadata/2006/001403.html),
though I think this functionality actually drops out of choosing to
represent the pdf as a set of sample points and weights - and is not
really dependant on how you represent these within CF - though of course
some ways will be easier to handle and more efficient than others)

(I think this is (at least logically) a super set of the ensemble
problem: ensembles are just a special case of these pdf representations
with additional requirements related to data provenance and the ability
to sub select samples based on their provenance - and of course these
issues introduce their own requirements e.g. related to distributed data
sets and data set aggregation.)

The thing I missed in the original posting is that the pdfs can be
multivariate involving two (or more) quantities with different standard
names. This means we may have to support standard names of the form

   weight:standard_name="air_temperature precipitation_amount
realization_weight";

This could get unwieldy as you deal with joint distributions of many
quantities. I guess this also has implications for the standard_name
table - so thought it was worth raising.

Jamie


On Wed, 2006-05-03 at 10:30 +0100, Kettleborough, Jamie wrote:
> Hello,
>
> there are a couple of projects (Hadley Centre QUMP project and climateprediction.net) that will be distributing
> probabilistic forecasts of climate change based on ensembles of model runs. One way of representing these
> results will be a set of model runs and a set of weights that should be applied to each model run.
> I think this can be accommodated straight forwardly, and reasonably generally, in the CF standard
> using 'ancillary_variables' and the addition of
>
> 1) New standard name 'sample' used to label a dimension that can be thought
> of as a statistical sample (e.g. ensemble)
>
> 2) New standard name modifier 'sample_weight' used to label variables that are acting
> as weights for other quantities
>
> e.g. course map of predictions of 21st century temperature change
> In this example temperature is dimensioned by sample as well as the normal space and time dimensions. Each
> sample is the result of one model run. Some models are less realistic than others and so should be down weighted
> in any subsequent analysis. The weights variable gives the weight for each ensemble member.
>
> dimensions:
> lat = 18 ;
> lon = 36 ;
> time = 10 ;
> sample = 10000 ; // sample points
> variables:
> float temp(sample,time,lat,lon) ; // each sample is the result from one ensemble member
> temp:long_name = "Temperature at 1.5m" ;
> temp:standard_name = "air_temperature" ;
> temp:ancillary_variables = "weights" ;
> temp:source = "perturbed physics ensemble of HadSM3" ;
> float weights(sample) ; // the weight applied to each ensemble memeber
> weights:long_name = "likelihood weights for 1.5m air temperature" ;
> weights:standard_name = "air_temperature sample_weights" ;
>
> Notes:
> 1. The sample points can be generated from a perturbed physics ensemble or a detection attribution
> exercise (or possibly some other statistical method) so don't think you want to explicitly use the term
> 'ensemble'. 'sample' is better. (though potentially confusing with grab samples or bucket samples?
> - maybe 'distribution_sample is a better name?)
> 2. If the sample dimension is not identified by its standard name then there is an implied rule that
> the software has to infer which dimension to apply the weights to based on the common dimension.
> 3. sample_weight variables have an implied valid_min=0, and valid_max=1.
> (although the valid_max may be relaxed if you are prepared to renormalise later)
> 4. The 'ancillary_variable' attribute may point to more than one sample_weight. This might represent
> different sensitivity studies, different observations used for skill scores, or different methodologies.
> In this case each sample_weight should be thought of as applied stand alone. They are not applied in sequence.
> 5. The same sample_weight variable can be referenced by more than one variable. This is useful for forming
> joint (multidimensional) pdfs between variables. In this case although the ordering of the samples is
> arbitrary it must be used consistently: the same order should be used for all variables in the file.
> 6. The creation method of the sample points and associated weights should be left to description in
> 'source' attribute (which may refer to URL for more information). In the case of perturbed physics
> ensembles the derivation of weights can be complex so reference to external documents to describe the method
> will avoid unnecessarily overloading the usage metadata.
> 7. in other examples the weights might be a function of space and time as well as sample member.
>
> I hope this all makes enough sense for people to make a judgement on whether this should be accepted or not.
> Obviously if I've been unclear let me know and I'll try and be more eloquent. If this all makes sense there
> will be a few follow up e-mails with specific requests for standard names.
>
> There are a couple of other representations of probabilistic forecast that might be used. These can be posted
> as separate suggestions as and when needed.
>
> Thanks,
>
> Jamie
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://www.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Mon Jan 15 2007 - 06:26:19 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:40 BST

⇐ ⇒