[CF-metadata] Return periods from Jonathan Gregory on 2014-09-08 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Jonathan Gregory <j.m.gregory>
Date: Mon, 8 Sep 2014 15:51:34 +0100

Dear Dan

I see - thanks for the explanation. The F(x) has been calculated not just from
the month concerned, but from a longer period, which you haven't mentioned,
I presume. You are using F(x) as a lookup function, in effect, to convert
precipitation amount to probability. As we said before, I think that F(x)
should generally have a standard name of cumulative_distribution_function_of_
precipitation_amount, but you would expect that to have a coordinate (i.e. an
independent variable) of x=precipitation_amount, which your field will not. You
earlier suggested that x could be an auxiliary coordinate variable, but I don't
think that would be appropriate, because aux coords do not store independent
variables. They depend on the independent variables stored in the coords.

If we don't want to (slightly mis)use cumulative_distribution_function_of_
precipitation_amount for your purpose, I would suggest that we could have
a standard_name of something like precipitation_amount_converted_to_
cumulative_probability for your F(x(lon,lat)). I suggest this to capture the
idea that it's really another way of stating x(lon,lat), to which you have
applied F as a known function. I haven't used the phrase expressed_as, which
occurs in other standard names, because this conversion involves data, not
just well-known constants one can look up.

What do you and others think?

Best wishes

Jonathan

----- Forwarded message from "Hollis, Dan" <dan.hollis at metoffice.gov.uk> -----

> From: "Hollis, Dan" <dan.hollis at metoffice.gov.uk>
> To: "Gregory, Jonathan" <j.m.gregory at reading.ac.uk>,
> "cf-metadata at cgd.ucar.edu" <cf-metadata at cgd.ucar.edu>
> Subject: RE: [CF-metadata] Return periods
> Date: Mon, 8 Sep 2014 13:47:49 +0000
>
> Hi Jonathan,
>
> We have two 2D fields that we would like to store - the first contains precipitation amount, x, for a specific month (e.g. Aug 2014), and the second contains the return period (or probability) of the precipitation amount in the first field i.e. F(x).
>
> The relationship between these two fields is at the grid point level i.e. the value at any given grid point in the second field is the value of F(x) corresponding to the value of x at the same location in the first field. Note that the relationship between the two quantities varies with location i.e. the return period of 100 mm in the Scottish mountains is very different to the return period of 100 mm in southeast England.
>
> As x is a floating point quantity the number of unique values in the first field will essentially equal the number of grid points (i.e. approximately 10000 for the UK). If the second field were to use x as a coordinate variable then the size of this coordinate would be 10000, and for any given grid point in space there would be a value of F(x) for only one value of this coordinate i.e. the 3D field F(x,lat,lon) would be very sparse.
>
> Looking at it another way, if we were intending to store fields of F(x) for a small number of fixed values of x (e.g. 10 mm, 20 mm, 30 mm, 40 mm etc) then I can see that having x as a coordinate variable would make sense. However what we actually want to do is store F(x) for a single value of x, but where the value of x is different for each location.
>
> Does that make any sense? I think I have it clear in my own mind but I find it quite hard to describe.
>
> Dan
>
>
>
> -----Original Message-----
> From: CF-metadata [mailto:cf-metadata-bounces at cgd.ucar.edu] On Behalf Of Jonathan Gregory
> Sent: 04 September 2014 17:40
> To: cf-metadata at cgd.ucar.edu
> Subject: Re: [CF-metadata] Return periods
>
> Dear Dan
>
> I don't think I have understood this. What is the field of precipitation
> amount? The F(x) would actually be a 3D data variable, I think, F(x,lat,lon),
> which gives the probability that precipitation is less than x at (lat,lon).
> For this field, x is a 1D coord variable, not a field.
>
> Best wishes
>
> Jonathan
>
>
> > Yes, that is what I had in mind. What slightly concerns me is that I would effectively end up storing the precipitation amount twice:
> >
> > - once as a data variable in its own right
> >
> > - once as an auxilliary coordinate, with F(x) as the data variable
> >
> > Duplication, especially within the same data archive, seems like something to be avoided if possible, hence my idea to have F(x) as the auxilliary coordinate variable and precipitation amount as the data variable and not store F(x) as a separate data variable. Do you think that it would be preferable/acceptable to store the precipitation values twice?
> >
> > Regards,
> >
> > Dan
> >
> >
> > -----Original Message-----
> > From: CF-metadata [mailto:cf-metadata-bounces at cgd.ucar.edu] On Behalf Of Jonathan Gregory
> > Sent: 04 September 2014 14:14
> > To: cf-metadata at cgd.ucar.edu
> > Subject: [CF-metadata] Return periods
> >
> > Dear Dan
> >
> > I agree with you that it would be better to store F(x) than to use your sign
> > convention for return periods. However it would be fine to split the return
> > periods into the two tails in different data variables and give them distinct
> > standard names. We have some standard names for such things e.g.
> > spell_length_of_days_with_lwe_thickness_of_precipitation_amount_above_threshold
> > and you could propose suitable ones.
> >
> > If you store F(x), I think it would be a data variable, not a coordinate or
> > ancillary variable, and it should have a standard name. I believe the guidance
> > you quote is about probability distribution functions rather than cumulative
> > (probability) distribution functions. Following a similar approach, however,
> > we could have a standard name such as
> > cumulative_distribution_function_of_precipitation_amount
> > for F(x), where x is precipitation_amount, which would be a coordinate. Is
> > that what you have in mind?
> >
> > Cheers
> >
> > Jonathan
> >
> >
> > ----- Forwarded message from "Hollis, Dan" <dan.hollis at metoffice.gov.uk> -----
> >
> > > Dear all,
> > >
> > > Here is another question related to migrating our UK climate grids to NetCDF.
> > >
> > > As well as grids of the monthly rainfall total (in mm) we also generate grids of the estimated return period of the rainfall total (in years). Currently these two quantities are stored in separate files (with only the file name and location to tell us they are related). I've been trying to think how to store the return period information using CF-NetCDF and would be grateful for advice.
> > >
> > > Some further details:
> > >
> > > Our existing grids contain the return period in years i.e. if the return period for a particular grid point is N years then this means that we estimate that the rainfall total for that grid point will be exceeded on average once every N years. This is equivalent to saying that each year there is a probability of 1/N of exceeding that rainfall amount i.e. the cummulative distribution function, F(x) = 1 - 1/N. For example, if N = 10 then F(x) = 0.9. Additionally, as we are also interested in droughts, we have adopted our own convention of using negative values to refer to the left (dry) tail of the rainfall distribution. For example N = -10 is used to mean that F(x) = 0.1 i.e. we estimate that rainfall amounts *less* than the observed value will occur once every 10 years on average.
> > >
> > > This use of positive and negative values to indicate return periods relating to the right (wet) and left (dry) tails is convenient but unconventional. My initial thought is that we should store F(x) itself and only convert to return period for the purposes of presentation e.g. creating maps.
> > >
> > > So, how to store F(x)? The main problem is that the value to which the return period relates (i.e. the rainfall amount) varies from one grid point to another. Two possibilities occur to me, both of which involve storing F(x) alongside the rainfall total:
> > >
> > > - Store F(x) as an auxilliary coordinate
> > >
> > > - Store F(x) as ancillary data
> > >
> > > It's not clear to me whether one is better than the other, or even whether either approach is valid.
> > >
> > > The other question is what to call the F(x) values. The guidance for ancillary data says to use standard name modifiers to indicate the relationship, but there doesn't seem to be anything suitable for describing F(x).
> > >
> > > The other thing I've looked at is the guidance for constructing standard names. I can't seem to locate this on the current CF web site so I've refered to the archived copy available here:
> > >
> > > https://web.archive.org/web/20130728212039/http://cf-pcmdi.llnl.gov/documents/cf-standard-names/guidelines
> > >
> > > The section on transformations includes 'probability_distribution_of_X[_over_Z]' in the list, however it's unclear to me whether this is what I need, or even how I might use it in other circumstances. The notes state:
> > >
> > > "probability distribution (i.e. a number in the range 0.0-1.0 for each range of X) of variations (over Z) of X. The data variable should have an axis for X."
> > >
> > > The reference to 'each range of X' is the bit I find confusing. Is the idea to store F(X1), F(X2), F(X3) etc, or is it intended to be F(X2) - F(X1), F(X3) - F(X2), F(X4) - F(X3) etc? The former doesn't quite fit the description, but the latter has the problem that the number of ranges (= the number of data values) will be one less than the number of X values. I can't see any existing names that use this transformation to use as a guide.
> > >
> > > If anyone can help that would be much appreciated.
> > >
> > > Thanks,
> > >
> > > Dan
> > >
> > >
> > > Dan Hollis Climatologist
> > > Met Office Hadley Centre FitzRoy Road Exeter Devon EX1 3PB United Kingdom
> > > Tel: +44 (0)1392 886780 Fax: +44 (0)1392 885681
> > > E-mail: dan.hollis at metoffice.gov.uk Website: http://www.metoffice.gov.uk
> > > For UK climate and past weather information, visit http://www.metoffice.gov.uk/climate
> > >
> > >
> >
> > > _______________________________________________
> > > CF-metadata mailing list
> > > CF-metadata at cgd.ucar.edu
> > > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> >
> >
> > ----- End forwarded message -----
> > _______________________________________________
> > CF-metadata mailing list
> > CF-metadata at cgd.ucar.edu
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
> ----- End forwarded message -----
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

----- End forwarded message -----
Received on Mon Sep 08 2014 - 08:51:34 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:42 BST