⇐ ⇒

[CF-metadata] Question from NODC about interplay of standard name modifiers, cell_methods, etc.

From: Steve Hankin <steven.c.hankin>
Date: Mon, 01 Apr 2013 09:12:53 -0700

Hi All,

All interesting questions are questions of balance. This discussion
raises interesting questions. What are the issues we are balancing.

  * On the one side is *technical precision*: how to correctly describe
    the transformations that have been applied
  * Balancing this is *usability*: end users need to easily understand
    and use the data in these files

Our current encoding (standard_name and cell_methods) does well on
technical precision and poorly on usability. A user who selects a
variable with standard_name "sea water temperature", downloads it, and
then realizes only after looking at a plot that it is a variance of sea
water temperature, will understandably feel that she has been mislead.
Blaming the user for ignorance or the designer of search engines for
neglect is not a balanced outlook imho. We can foresee this problem (as
demonstrated by this thread). _It is our responsibility as designers is
to minimize the opportunities for confusion._

How can we strike this balance? That's the (entirely constructive)
topic that I'd lobby we should be addressing. I've included an
off-the-cuff proposal below in a P.S. I'm sure there are better ideas
out there.

     - Steve

P.S. One proposal: in all cases where a significant transformation (to
be defined) has been applied to the data after is has been measured, the
standard_name gets a generic modifier, say "(transformed)".
             ==> *"sea water temperature (transformed)"*
This will serve as a signal that forewarns users that the variable is
not simply "sea water temperature".


On 3/30/2013 6:22 AM, Jonathan Gregory wrote:
> Dear all
> I think Philip's posting points out that this disagreement is partly caused by
> a confusion. I agree with his distinction of two cases.
> Perhaps in Ken's use case, the standard deviation describes the spread of a
> number of measurements that are regarded as samples from a population. The
> difference between the samples is random error, not a dependence on time or
> space that is of interest to the user of the dataset. This also sounds like
> what Nan means by, "We collect in situ data, and I know that MANY of our
> instruments output the mean of several measurements, few do single spot
> samples." If the instrument itself does not output the individual
> measurements, the variation among them as a function of time or space is
> obviously of no geophysical interest.
> I agree that this standard deviation is a kind of measurement property. As Ken
> says, the standard error is usually calculated as the sample standard
> deviation divided by the square root of the number in the sample. However I
> appreciate that you might wish to report the standard deviation instead of the
> standard error. To do this, I agree that we would need a new standard_name
> modifier, which I suggest should be sample_standard_deviation to avoids its
> being confused with any other kind of standard deviation.
> Perhaps that is the answer I should have given to Ken's first question, instead
> of asking whether it was a temporal or a spatial standard deviation. In fact it
> is neither.
> Going on to the wider question, I agree with Ken that a mean is just as much a
> statistical operation as a standard deviation. Only a point measurement (which
> is also one of the cell_methods of Appendix E) is the "true" geophysical
> quantity. All the other methods are statistical ways of representing variation
> of that quantity within the cell. It probably doesn't seem surprising to
> regard a mean as the "same" quantity, nor the mode and median perhaps, but
> maybe you begin to feel uncomfortable when moving on to the maximum and
> minimum, the range (absolute difference between max and min, which is going to
> be added as a cell_method in the next version of CF,
> https://cf-pcmdi.llnl.gov/trac/ticket/65), and finally the standard deviation
> and the variance, the last of which has different units. All these methods
> belong to the same family, and it seems to me it would be arbitrary and
> therefore unsatisfactory to choose a certain level of surprise or discomfort
> in order to decide when it was no longer the "same" geophysical quantity. The
> only solution, I think, is for everyone to learn that the standard name is
> only a *part* of the description of the attribute, as John says.
> John asked whether the difference between cell_methods and standard_name
> modifiers could be clearly stated. My understanding is that standard_name
> modifiers denote ancillary variables (they were introduced to CF at the same
> time), whose purpose is to provide metadata about the individual values of
> another data variable (start of section 3.4), while cell methods indicate the
> statistical methods whereby the data values represent variation within cells.
> This is a difference, but it might not be sufficiently obvious to require them
> to be different features in CF.
> The sample standard deviation *could* be represented by cell_methods, if we
> introduced a notional axis of sample, to index the samples, and then collapsed
> it to size one, like we do for a time-mean or a zonal mean. The sample
> standard deviation would then be described in cell_methods as "sample:
> standard_deviation". Likewise, the number of observations could be regarded as
> a cell_method that counted the size of its (sample) axis. If we do not wish
> to maintain the difference, we could simplify the standard by abolishing
> standard_name modifiers and creating some new cell_methods, some of which
> might be of a different kind from before because they wouldn't refer to a
> particular axis.
> Cheers
> Jonathan
> _______________________________________________
> CF-metadata mailing list
> CF-metadata at cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20130401/1cf29e70/attachment.html>
Received on Mon Apr 01 2013 - 10:12:53 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST

⇐ ⇒