⇐ ⇒

[CF-metadata] Question from NODC about interplay of standard name modifiers, cell_methods, etc.

From: Jonathan Gregory <j.m.gregory>
Date: Sat, 30 Mar 2013 13:22:24 +0000

Dear all

I think Philip's posting points out that this disagreement is partly caused by
a confusion. I agree with his distinction of two cases.

Perhaps in Ken's use case, the standard deviation describes the spread of a
number of measurements that are regarded as samples from a population. The
difference between the samples is random error, not a dependence on time or
space that is of interest to the user of the dataset. This also sounds like
what Nan means by, "We collect in situ data, and I know that MANY of our
instruments output the mean of several measurements, few do single spot
samples." If the instrument itself does not output the individual
measurements, the variation among them as a function of time or space is
obviously of no geophysical interest.

I agree that this standard deviation is a kind of measurement property. As Ken
says, the standard error is usually calculated as the sample standard
deviation divided by the square root of the number in the sample. However I
appreciate that you might wish to report the standard deviation instead of the
standard error. To do this, I agree that we would need a new standard_name
modifier, which I suggest should be sample_standard_deviation to avoids its
being confused with any other kind of standard deviation.

Perhaps that is the answer I should have given to Ken's first question, instead
of asking whether it was a temporal or a spatial standard deviation. In fact it
is neither.

Going on to the wider question, I agree with Ken that a mean is just as much a
statistical operation as a standard deviation. Only a point measurement (which
is also one of the cell_methods of Appendix E) is the "true" geophysical
quantity. All the other methods are statistical ways of representing variation
of that quantity within the cell. It probably doesn't seem surprising to
regard a mean as the "same" quantity, nor the mode and median perhaps, but
maybe you begin to feel uncomfortable when moving on to the maximum and
minimum, the range (absolute difference between max and min, which is going to
be added as a cell_method in the next version of CF,
https://cf-pcmdi.llnl.gov/trac/ticket/65), and finally the standard deviation
and the variance, the last of which has different units. All these methods
belong to the same family, and it seems to me it would be arbitrary and
therefore unsatisfactory to choose a certain level of surprise or discomfort
in order to decide when it was no longer the "same" geophysical quantity. The
only solution, I think, is for everyone to learn that the standard name is
only a *part* of the description of the attribute, as John says.

John asked whether the difference between cell_methods and standard_name
modifiers could be clearly stated. My understanding is that standard_name
modifiers denote ancillary variables (they were introduced to CF at the same
time), whose purpose is to provide metadata about the individual values of
another data variable (start of section 3.4), while cell methods indicate the
statistical methods whereby the data values represent variation within cells.
This is a difference, but it might not be sufficiently obvious to require them
to be different features in CF.

The sample standard deviation *could* be represented by cell_methods, if we
introduced a notional axis of sample, to index the samples, and then collapsed
it to size one, like we do for a time-mean or a zonal mean. The sample
standard deviation would then be described in cell_methods as "sample:
standard_deviation". Likewise, the number of observations could be regarded as
a cell_method that counted the size of its (sample) axis. If we do not wish
to maintain the difference, we could simplify the standard by abolishing
standard_name modifiers and creating some new cell_methods, some of which
might be of a different kind from before because they wouldn't refer to a
particular axis.

Cheers

Jonathan
Received on Sat Mar 30 2013 - 07:22:24 GMT

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒