Hi all,
Following the various discussions in this thread I would like to request the following new standard name:
precipitation_amount_converted_to_cumulative_probability
With the following definition:
"Amount" means mass per unit area. A variable whose standard name has the form X_converted_to_cumulative_probability will contain a value of the cumulative distribution function of X i.e. the probability of observing a value of X less than or equal to the value of X defined by the cell bounds and cell methods. The variable must have a value between 0.0 and 1.0. The cell methods must describe the processing of quantity X prior to the conversion to probability.
The units would be '1'.
I'm not sure how good the above definition is, so I'd welcome suggestions for improvement.
I'd also like to propose the following addition to the Transformations section of the Guidelines for Construction of CF Standard Names:
X_converted_to_cumulative_probability
With units of '1' and the following meaning:
Cumulative distribution function of X i.e. a value between 0.0 and 1.0 giving the probability of observing a value of X less than or equal to the value of X defined by the cell bounds and cell methods.
Again, I'd be grateful for suggestions on how to improve the wording to make the meaning clearer and unambiguous.
Thanks,
Dan
-----Original Message-----
From: CF-metadata [mailto:cf-metadata-bounces at cgd.ucar.edu] On Behalf Of Hollis, Dan
Sent: 11 September 2014 16:35
To: Gregory, Jonathan; cf-metadata at cgd.ucar.edu
Subject: Re: [CF-metadata] Return periods
Hi Jonathan,
Following our brief chat earlier this week I think I have a better understanding of the right way to tackle this. For the record here are the key points:
As described previously, the probability is a conversion of the precipitation amount. To store both quantities _could_ be seen as redundancy. However the conversion process is non-trivial hence it is justified to store both.
We _could_ store both quantities in the same file (as suggested below) however this, of itself, does not establish any special link between the variables (other than making it easy to see that they share coordinates). As we plan to store all our other variables (temperature, wind speed, sunshine etc) in separate files it makes sense to do the same for the precipitation probability (rather than create an exception for one variable).
Your proposed standard name of "precipitation_amount_converted_to_cumulative_probability" might lead the user to ask 'which precipitation amount?'. Your recommendation is for the precipitation probability variable to have the same time bounds and cell method as the precipitation amount variable e.g. bounds = "2014-08-01 09:00, 2014-09-01 09:00" (for Aug 2014) and cell_methods = "time: sum". The idea is that this would be sufficient to define which precipitation amount the probability relates to (although the user would have to seek out the precipitation amount field itself if they needed to know the actual values). I guess it would be important to declare in the definition that the cell method is applied *before* the conversion to cumulative probability.
Does this agree with what you had in mind?
Regarding standard names, I shall request the name you suggested unless anyone else has other ideas. However, I also have two general questions related to this:
Given that cumulative probabilities may be of general interest to other users, would it be helpful to add "X_converted_to_cumulative_probability" to the list of transformations in the Guidelines for Standard Names?
Given that the meaning of each transformation is defined in the Guidelines, is it necessary to request a new standard name if I am simply combining an existing transformation with an existing standard name?
The reason for my second question is that I can see many examples of standard names that incorporate transformations (e.g. change_over_time_in_X, direction_of_X, divergence_of_X etc). Would it not be better practice to define only untransformed quantities, and then allow users to combine these with any of the defined transformations without needing to add to the standard name table?
Regards,
Dan
-----Original Message-----
From: CF-metadata [mailto:cf-metadata-bounces at cgd.ucar.edu] On Behalf Of Jonathan Gregory
Sent: 09 September 2014 11:28
To: cf-metadata at cgd.ucar.edu
Subject: Re: [CF-metadata] Return periods
Dear Dan
Yes, I see what you mean regarding the aux coord, and it's a neat idea, but
it doesn't seem quite right to me. Aux coords are alternative or additional
information. The lat(x,y) and lon(x,y) coordinates provide an alternative way
to locate the point (x,y), in a different coordinate system. The precipitation
probability, however, would determine the precipitation entirely. There isn't
any coordinate information which would give you the precipitation amount. That
is why I don't think the probability can be an aux coord. Does that make sense?
> You are right regarding the calculation - we are using a statistical model of the relationship between monthly rainfall and return period that was developed many years ago by a colleague from an analysis of 60 years of historical data. The model uses values of the coefficients of variation and skewness to describe the distribution of monthly rainfall (assumed to be log-normal). To capture how the shape of the distribution varies with location we have pre-calculated values of these coefficients available at each point on a 5 km grid.
Right. So it is reasonable to describe it as a conversion of precipitation
amount to probability, I think.
> If a new standard name is required then I'm happy to take your advice on a suitable choice.
It would be useful to know if anyone else reading this has a view on my
suggestion of precipitation_amount_converted_to_cumulative_probability.
> What is still not clear to me is how I maintain a clear link between the two fields without storing some of the information twice. Is it simply a case of storing two variables in the same NetCDF file (so that they share coordinates)?
If they are in the same file, indeed it is obvious if the fields have the same
spatiotemporal coordinates, because they share the coord vars, as you say. If
they are in different files, the data-user has to check whether the coords are
the same. There is no convention which would allow one to be sure about that
without checking. CF does not rely on variable names, for instance. This is a
very common situation, in fact. For instance, in the CMIP archives each
quantity is in a separate file, and the data variables in many files typically
have the same spatiotemporal coordinates, but analysis software cannot be sure
of that without checking.
Best wishes
Jonathan
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
CF-metadata at cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Received on Thu Sep 18 2014 - 08:58:12 BST