On 2014-09-09 Jonathan Gregory <j.m.gregory at reading.ac.uk> commented:
>> You are right regarding the calculation - we are using a statistical model
>> of the relationship between monthly rainfall and return period that was
>> developed many years ago by a colleague from an analysis of 60 years
>> of historical data.
[...]
> Right. So it is reasonable to describe it as a conversion of precipitation
> amount to probability, I think.
> It would be useful to know if anyone else reading this has a view on my
> suggestion of precipitation_amount_converted_to_cumulative_probability.
Yes, I have a slightly more general view of this. I think it is not so useful to try to include the whole or part of the data processing that produced the values _as part of the variable name itself_. For many practical applications it is most relevant for its use to know what the data _is_, not where it came from or how in details how it came about.
Thus with this particular case as example I would rather have preferred something more direct like one of these:
precipitation_amount_cumulative_probability
precipitation_cumulative_probability
cumulative_precipitation_probability
The reason I think this is a better idea, is that we easily can imagine that alternative approaches do exist for arriving at the same desired quantity. E.g. there might exist one process that does something like this:
1. Measure precipitation.
2. Run statistics of the measurements and come up with a "probability" estimate.
However, alternatives may exist, such as this:
1. Make use of collected measurements (or results from other models) describing a set of properties other that precipitation.
2. Apply some statistical approach to this, which may then predict a "probability" directly.
My point about all of this, though, is that for the next model "down stream" of the above, the only relevance may be that it needs the probabilities as input to its calculations. Therefore introducing the specifics about whence the data came contributes nothing more than semantic distraction (or added complexity) into that model.
Please don't take me wrong on this though, as I'm not suggesting that the additional information about the processing is irrelevant as such! In fact I think it could be highly relevant to know this in many contexts. However, there are other mechanisms in NetCDF/CF to convey this already, which seems even better suited for that kind of info. I'm thinking in particular of the "comment", "history" and "source" attributes.
I did a quick scan through v27 of the standard name table, and I could only locate a couple of names that hint at which processing has been applied to it: Thoes are the "eastward_transformed_eulerian_mean_air_velocity" and
"northward_transformed_eulerian_mean_air_velocity" and their aliases.
Of other typical semantic fragments found in the names, I found:
- Those that hint at the actual 'units' to be expected: "mole", "moles_of", "fraction", "fraction_of".
- Those that indicate causality: "due_to".
- Those that indicate a "medium" or environment: "in_air", "in_sea_water", ...
There are also some that we could classify as border-line cases: those with "product_of" and "derivative_of". (It can still be argued that this term is only about WHAT, not HOW.)
To sum up I therefore think that adopting a term like "converted_to" into the standard CF nomenclature would represent a significant change of the principles followed so far.
I would appreciate if others could contribute their view on these matters, though!
--
Regards,
-+-Ben-+-
Received on Thu Sep 18 2014 - 11:08:18 BST