[CF-metadata] CF and a representation of probalistic forecasts
Hello,
there are a couple of projects (Hadley Centre QUMP project and climateprediction.net) that will be distributing
probabilistic forecasts of climate change based on ensembles of model runs. One way of representing these
results will be a set of model runs and a set of weights that should be applied to each model run.
I think this can be accommodated straight forwardly, and reasonably generally, in the CF standard
using 'ancillary_variables' and the addition of
1) New standard name 'sample' used to label a dimension that can be thought
of as a statistical sample (e.g. ensemble)
2) New standard name modifier 'sample_weight' used to label variables that are acting
as weights for other quantities
e.g. course map of predictions of 21st century temperature change
In this example temperature is dimensioned by sample as well as the normal space and time dimensions. Each
sample is the result of one model run. Some models are less realistic than others and so should be down weighted
in any subsequent analysis. The weights variable gives the weight for each ensemble member.
dimensions:
lat = 18 ;
lon = 36 ;
time = 10 ;
sample = 10000 ; // sample points
variables:
float temp(sample,time,lat,lon) ; // each sample is the result from one ensemble member
temp:long_name = "Temperature at 1.5m" ;
temp:standard_name = "air_temperature" ;
temp:ancillary_variables = "weights" ;
temp:source = "perturbed physics ensemble of HadSM3" ;
float weights(sample) ; // the weight applied to each ensemble memeber
weights:long_name = "likelihood weights for 1.5m air temperature" ;
weights:standard_name = "air_temperature sample_weights" ;
Notes:
1. The sample points can be generated from a perturbed physics ensemble or a detection attribution
exercise (or possibly some other statistical method) so don't think you want to explicitly use the term
'ensemble'. 'sample' is better. (though potentially confusing with grab samples or bucket samples?
- maybe 'distribution_sample is a better name?)
2. If the sample dimension is not identified by its standard name then there is an implied rule that
the software has to infer which dimension to apply the weights to based on the common dimension.
3. sample_weight variables have an implied valid_min=0, and valid_max=1.
(although the valid_max may be relaxed if you are prepared to renormalise later)
4. The 'ancillary_variable' attribute may point to more than one sample_weight. This might represent
different sensitivity studies, different observations used for skill scores, or different methodologies.
In this case each sample_weight should be thought of as applied stand alone. They are not applied in sequence.
5. The same sample_weight variable can be referenced by more than one variable. This is useful for forming
joint (multidimensional) pdfs between variables. In this case although the ordering of the samples is
arbitrary it must be used consistently: the same order should be used for all variables in the file.
6. The creation method of the sample points and associated weights should be left to description in
'source' attribute (which may refer to URL for more information). In the case of perturbed physics
ensembles the derivation of weights can be complex so reference to external documents to describe the method
will avoid unnecessarily overloading the usage metadata.
7. in other examples the weights might be a function of space and time as well as sample member.
I hope this all makes enough sense for people to make a judgement on whether this should be accepted or not.
Obviously if I've been unclear let me know and I'll try and be more eloquent. If this all makes sense there
will be a few follow up e-mails with specific requests for standard names.
There are a couple of other representations of probabilistic forecast that might be used. These can be posted
as separate suggestions as and when needed.
Thanks,
Jamie
Received on Wed May 03 2006 - 03:30:01 BST
This archive was generated by hypermail 2.3.0
: Tue Sep 13 2022 - 23:02:40 BST