⇐ ⇒

[CF-metadata] cf standards for probabilities

From: Vegard Bønes <vegard.bones>
Date: Mon, 18 Nov 2013 06:40:56 +0000 (UTC)

Hi, all!

The cf metadata standard lacks specifications for dealing with probabilities. To amend this, we would like to propose the following additions to the cf metadata standard. This proposal is largely based on an old discussion on this list, from 2011. The discussion is in the archive, with the first message here: http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2011/049412.html

This suggestion consists of three separate parts: A new standard_name modifier, a new standard_name, and conventions for handling more complex cases.


1) A new standard name modifier: confidence

The simplest cases may be expressed using a new standard name modifier, confidence. It changes the variable's meaning to say something about our confidence of another variable's correctness. Appending this to a variable always changes its units to 1 or something equivalent. In this case, 1 means high confidence, and 0 means no confidence.

The modifier may be used like this:

float wind_speed(time, latitude, longitude) ;
  wind_speed:units = "m/s" ;
  wind_speed:standard_name = "wind_speed" ;
  wind_speed:ancillary_variables = "wind_speed_confidence" ;

float wind_speed_confidence(time, latitude, longitude) ;
  wind_speed_confidence:units = "1" ;
  wind_speed_confidence:standard_name = "wind_speed confidence" ;

In the second variable, the units have changed from "m/s" to "1", and the standard_name contains the new modifier. This means that the data expresses confidence that wind_speed is correct for each time and grid point. Note that in this case, we do not explicitly specify what we mean by "correct".


2) A new standard_name: cumulative_distribution_function_over_realization

This is intended to be used when specifying various scenarios as percentiles for data. It is a list of quantiles of some sort (such as percentiles), which may be used as dimensions for other variables. It may for example look like this:

float percentile(percentile) ;
  percentile:units = "%" ;
  percentile:standard_name = "cumulative_distribution_function_over_realization" ;
  
float air_temperature_percentiles(time, percentile, latitude, longitude) ;
  air_temperature_percentiles:units = "K" ;
  air_temperature_percentiles:standard_name = "air_temperature" ;

The percentile dimension may for example contain these values: 10, 25, 50, 75, 90. air_temperature_percentiles would then contain data for five different cases.


3) Conventions for intervals

In some cases, it may be necessary to be more specific when stating confidence of a given value. What do we mean when we say that we are 79% certain that air temperature will be 16 degrees? In this case, we may want to be more specific. We may for instance want to say that we are 79% certain that temperature will be between +/- 1 degrees of 16 degrees, and 93% certain that temperature will be between +/- 2 degrees of that. For this purpose we introduce a convention for specifying this: intervals.

Using intervals does not require any new standard names or modifiers. Instead, we use bounds to specify the ranges for data. Here is an example, using air_temperature, where we give confidence that a temperature forecast is within +/- 1.5 or +/- 2.5 degrees:

float temperature_bounds(interval_of_air_temperature, confidence_bounds) ; [-1.5, 1.5, -2.5, 2.5]
  temperature_bounds:long_name = "bounds of temperature - for confidence variables" ;

float interval_of_air_temperature(interval_of_air_temperature) ; [1.5, 2.5]
  interval_of_air_temperature:bounds = "temperature_bounds" ;
  interval_of_air_temperature:units = "K" ;
  interval_of_air_temperature:long_name = "air_temperature offset from a given value (in either direction)" ;

float air_temperature_confidence(time, interval_of_air_temperature, latitude, longitude) ;
  air_temperature_confidence:units = "1" ;
  air_temperature_confidence:standard_name = "air_temperature confidence" ;
  air_temperature_confidence:long_name = "probability of air_temperature within +/- interval_of_air_temperature" ;

float air_temperature(time, latitude, longitude) ;
  air_temperature:units = "K" ;
  air_temperature:standard_name = "air_temperature" ;
  air_temperature:ancillary_variables = "air_temperature_confidence" ;

There are many things to note here.

  * air_temperature_confidence uses the new standard name modifier, confidence.
  * air_temperature_confidence has an extra dimension, interval_of_air_temperature. This specifies what range of air temperature, relative to the forecast, we specify our confidence for.
  * interval_of_air_temperature, in turn, specifies a bounds variable, which gives the exact temperature offset in each direction.


Specifying lower limits of precipitation

We may also want to express chances that some value will be above or below a certain threshold. A similar construct may be used for this. Here is an example, where we express confidence that the amount of rain for a period will be above a certain threshold:

float precipitation_bounds(lower_limit_of_precipitation, confidence_bounds) ; [0.1,inf, 0.2,inf, 0.5,inf, 1,inf, 2,inf, 5,inf]
  precipitation_bounds:long_name = "bounds of precipitation - for confidence variables" ;

float lower_limit_of_precipitation(lower_limit_of_precipitation) ; [0.1, 0.2, 0.5, 1, 2, 5]
  lower_limit_of_precipitation:bounds = "precipitation_bounds" ;
  lower_limit_of_air_temperature:units = "kg/m2" ;
  lower_limit_of_precipitation:long_name = "lower limit of precipitation" ;

float precipitation_limit_confidence(time, lower_limit_of_precipitation, latitude, longitude) ;
  precipitation_limit_confidence:units = "1" ;
  precipitation_limit_confidence:standard_name = "precipitation_amount confidence" ;
  precipitation_limit_confidence:long_name = "probability of precipitation_amount above precipitation_limit" ;
  precipitation_limit_confidence:cell_methods = "time: sum" ;

I this case, we use infinity as upper bounds for precipitation. In cases where bounds type does not have a special infinity value, such as int, that variable's max value should be used.

Any comments?


-- Vegard
Received on Sun Nov 17 2013 - 23:40:56 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒