Dear all
Jim asked,
"As some examples of the confusing situation we have now, why do we have a
separate word modifier number_of_observations instead of a
number_of_observations_of_X transformation modifier? Why don't we have
variance_of_X or anomaly_of_X transformations (or separate word modifiers
variance or anomaly)? Why isn't there a cell method for standard error? I
can't discern any logic behind the current partitioning."
I've tried to explain how this came about, but perhaps I am not being clear,
so let me try again:
* We introduced the modifiers like number_of_observations for those situations
where it was thought likely that a large number of standard names would need
them. Factorising out this dimension thus avoids a large expansion of the
standard name table. So far, only four anomaly_of names have been requested,
so it seems the right judgement not to have a standard_name modifier for that.
* That was also one of the motivations for cell_methods: there would be vastly
more standard names if we had to include all the cell_methods information too.
The other motivation for cell_methods is that the statistical operations
relate to particular axes. For instance, just "mean" is too vague: does it
mean time-mean, zonal-mean, mean over radiation wavelength, or what? The same
is true for variance. The cell_methods attribute makes this precise.
* There is not a cell method for standard error because it does not relate to
a particular dimension. The standard error is a metadata property of the
individual data. The cell methods statistically describe the variation of the
quantity within cells. These are different purposes.
While you may not agree with the logic, does this help to explain what it is?
If the situation is perceived as confusing and easily misunderstood, I am all
in favour of clarifying it by inserting more explanation and discussion in the
CF standard document. That could be done with a defect ticket. As Philip says,
it could shorten future discussions.
But we can also change the standard, of course. However, changes to existing
attributes are difficult for existing software. I do not think we need or
ought to change the existing attributes. While I appreciate the reason for the
suggestion, I feel that suffixing something to the standard_name to indicate
"something" has been done to it would not really help, because there is almost
*always* something done to it! Cell methods are recommended to be specified in
any case where the default "point" or "sum" is not correct. They should be
present if the quantity is a mean, in particular. A mean is also a
transformation, just like a standard deviation.
I am not convinced yet by the argument that we have to modify the CF standard
because the standard_name may be misunderstood or misused by software which
catalogues or serves datasets. CF introduced the standard_name attribute. If
it's being used now, software must already have been modified to support CF.
Well then, why can't be modified again to support CF more fully or correctly?
If we explained more clearly in the standard what the intention was, that would
no doubt help with future software design.
Instead of changing what we have, I think we should add to it. It seems to me,
as I've said before, that the existing proposal for "CF strings" summarising
some essential metadata (similar to the earlier proposal for common concepts
in some ways) would solve this problem. It is *that* kind of string, not the
standard name, that the user should be offered to select an appropriate
variable. It's a combination of attributes. It's not hard to assemble that
information from the separate attributes, but if that's an obstacle, we could
help software over it by recommending that this extra attribute be included.
Please have a look at
https://cf-pcmdi.llnl.gov/trac/ticket/94 and add your
comments on it.
Best wishes
Jonathan
Received on Tue Apr 02 2013 - 11:05:40 BST